falkon.kernels
Kernel
- class falkon.kernels.kernel.Kernel(name: str, opt: FalkonOptions | None)
Abstract kernel class. Kernels should inherit from this class, overriding appropriate methods.
To extend Falkon with new kernels, you should read the documentation of this class carefully, and take a look at the existing implementation of
GaussianKernel
orLinearKernel
. A walk-through for implementing a custom kernel is available as a notebook.There are several abstract methods which should be implemented, depending on the kind of operations which are supported by the implementing kernel.
The
compute()
method should compute the kernel matrix, without concerns for differentiability, while thecompute_sparse()
method is used to compute the kernel for sparse input matrices.- Parameters:
name – A short name for the kernel (e.g. “Gaussian”)
opt – Base set of options to be used for operations involving this kernel.
- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- _decide_dmmv_impl(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, opt: FalkonOptions)
Choose which dmmv function to use for this data.
Note that dmmv functions compute double kernel-vector products (see
dmmv()
for an explanation of what they are).- Parameters:
X1 (torch.Tensor) – First data matrix, of shape (N x D)
X2 (torch.Tensor) – Second data matrix, of shape (M x D)
v (torch.Tensor or None) – Vector for the matrix-vector multiplication (M x T)
w (torch.Tensor or None) – Vector for the matrix-vector multiplicatoin (N x T)
opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.
- Returns:
dmmv_fn – A function which allows to perform the mmv operation.
Notes
This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.
- _decide_mm_impl(X1: Tensor, X2: Tensor, diag: bool, opt: FalkonOptions)
Choose which mm function to use for this data.
Note that mm functions compute the kernel itself so KeOps may not be used.
- Parameters:
X1 (torch.Tensor) – First data matrix, of shape (N x D)
X2 (torch.Tensor) – Second data matrix, of shape (M x D)
diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.
- Returns:
mm_fn – A function which allows to perform the mm operation.
Notes
This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.
- _decide_mmv_impl(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, opt: FalkonOptions)
Choose which mmv function to use for this data.
Note that mmv functions compute the kernel-vector product
- Parameters:
X1 (torch.Tensor) – First data matrix, of shape (N x D)
X2 (torch.Tensor) – Second data matrix, of shape (M x D)
v (torch.Tensor) – Vector for the matrix-vector multiplication (M x T)
opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.
- Returns:
mmv_fn – A function which allows to perform the mmv operation.
Notes
This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.
- abstract compute(X1: Tensor, X2: Tensor, out: Tensor, diag: bool) Tensor
Compute the kernel matrix of
X1
andX2
- without regards for differentiability.The kernel matrix should be stored in
out
to ensure the correctness of allocatable memory computations.- Parameters:
X1 (torch.Tensor) – The left matrix for computing the kernel
X2 (torch.Tensor) – The right matrix for computing the kernel
out (torch.Tensor) – The output matrix into which implementing classes should store the kernel.
diag (bool) – If true,
X1
andX2
have the same shape, and only the diagonal ofk(X1, X2)
is to be computed and stored inout
. Otherwise compute the full kernel matrix.
- Returns:
out (torch.Tensor) – The kernel matrix. Should use the same underlying storage as the parameter
out
.
- abstract compute_sparse(X1: SparseTensor, X2: SparseTensor, out: Tensor, diag: bool, **kwargs) Tensor
Compute the kernel matrix of
X1
andX2
which are two sparse matrices, storing the output in the dense matrixout
.- Parameters:
X1 (SparseTensor) – The left matrix for computing the kernel
X2 (SparseTensor) – The right matrix for computing the kernel
out (torch.Tensor) – The output matrix into which implementing classes should store the kernel.
diag (bool) – If true,
X1
andX2
have the same shape, and only the diagonal ofk(X1, X2)
is to be computed and stored inout
.kwargs –
Additional keyword arguments which some sparse implementations might require. Currently the keyword arguments passed by the
falkon.mmv_ops.fmmv.sparse_mmv_run_thread()
andfalkon.mmv_ops.fmm.sparse_mm_run_thread()
functions are:X1_csr : the X1 matrix in CSR format
X2_csr : the X2 matrix in CSR format
- Returns:
out (torch.Tensor) – The kernel matrix. Should use the same underlying storage as the parameter out.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- extra_mem(is_differentiable, is_sparse, dtype, density1=None, density2=None) Dict[str, float]
Compute the amount of extra memory which will be needed when computing this kernel.
Often kernel computation needs some extra memory allocations. To avoid using too large block-sizes which may lead to OOM errors, you should declare any such extra allocations for your kernel here.
Indicate extra allocations as coefficients on the required dimensions. For example, if computing a kernel needs to re-allocate the data-matrix (which is of size n * d), the return dictionary will be: {‘nd’: 1}. Other possible coefficients are on d, n, m which are respectively the data-dimension, the number of data-points in the first data matrix and the number of data-points in the second matrix. Pairwise combinations of the three dimensions are possible (i.e. nd, nm, md), and a special key ‘0’ can be used to specify a base memory needed independently of data dimensions. Make sure to specify the dictionary keys as is written here since they will not be recognized otherwise.
- Parameters:
is_differentiable (bool) –
is_sparse (bool) –
dtype (torch.dtype or np.dtype) –
density1 (float or None) –
density2 (float or None) –
- Returns:
extra_allocs (dictionary) – A dictionary from strings indicating on which dimensions the extra-allocation is needed (allowed strings: ‘n’, ‘m’, ‘d’, ‘nm’, ‘nd’, ‘md’, ‘0’) to floating-point numbers indicating how many extra-allocations are needed.
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
DiffKernel
- class falkon.kernels.diff_kernel.DiffKernel(name, options, core_fn, **kernel_params)
Abstract class for differentiable kernels.
This class should be extended instead of
Kernel
whenever designing a custom kernel to be used with automatic hyperparameter optimization (see thehopt
module).Subclasses should implement the
detach()
method to return a new instance of the kernel, with its parameters detached from the computation graph.The
compute_diff()
method should be overridden, unless thecore_fn
parameter is passed to the constructor.Hyperparameters to the concrete kernel instance (for example the length-scale of the Gaussian kernel) should be passed to the constructor of this class, in order to be registered as parameters of the computation graph. Even non-differentiable parameters should be provided as keywords (also non tensor arguments).
- Parameters:
name – A short name for the kernel (e.g. “Gaussian”)
options – Base set of options to be used for operations involving this kernel.
core_fn – Optional function which can be used to compute a kernel matrix. The signature of the function should be:
core_fn(X1, X2, out, diag, **kernel_parameters)`
whereX1
andX2
are the input matrices,out
is the output matrix (it will beNone
when called fromcompute_diff()
),diag
is a flag indicating that only the diagonal of the kernel matrix is to be computed, and**kernel_parameters
includes all additional parameters belonging to the kernel (which are passed to the constructor ofDiffKernel
).kernel_params – All parameters (differentiable and non-differentiable) to be used for this kernel. The values given are used to initialize the actual parameters - which will be copied in the constructor.
- compute_diff(X1: Tensor, X2: Tensor, diag: bool)
Compute the kernel matrix of
X1
andX2
. The output should be differentiable with respect to X1, X2, and all kernel parameters returned by thediff_params()
method.- Parameters:
X1 (torch.Tensor) – The left matrix for computing the kernel
X2 (torch.Tensor) – The right matrix for computing the kernel
diag (bool) – If true,
X1
andX2
have the same shape, and only the diagonal ofk(X1, X2)
is to be computed and stored inout
. Otherwise compute the full kernel matrix.
- Returns:
out (torch.Tensor) – The constructed kernel matrix.
- abstract detach() Kernel
Detaches all differentiable parameters of the kernel from the computation graph.
- Returns:
k – A new instance of the kernel sharing the same parameters, but detached from the computation graph.
- property diff_params: Dict[str, Tensor]
A dictionary mapping parameter names to their values for all differentiable parameters of the kernel.
- Returns:
params – A dictionary mapping parameter names to their values
KeopsKernelMixin
- class falkon.kernels.keops_helpers.KeopsKernelMixin(name: str, opt: FalkonOptions | None)
Abstract class for kernels which enables KeOps acceleration.
This class should be extended when KeOps-accelerated kernel-vector products are required. Subclasses should implement the
keops_mmv_impl()
method by defining an appropriate KeOps formula, and calling into KeOps for kernel vector product calculation (the helper methodkeops_mmv()
can be used to call into KeOps).For help in implementing a subclass, check the existing implementations (e.g.
GaussianKernel
), and the Custom Kernels notebook.- keops_mmv(X1: Tensor, X2: Tensor, v: Tensor, out: Tensor | None, formula: str, aliases: List[str], other_vars: List[Tensor], opt: FalkonOptions)
Helper method to call into KeOps for kernel-vector products
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if
T
is very large the operations will be much slower.out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
formula (str) – The KeOps formula
aliases – Aliases referencing names in the formula with actual KeOps variables
other_vars – Kernel parameters to be used in the formula, other than
X1
,X2
andv
.opt – Options to be used for computing the operation. Useful are the memory size options, CUDA options and KeOps options.
- Returns:
out – The computed kernel matrix between
X1
andX2
, multiplied by vectorv
.
- abstract keops_mmv_impl(X1, X2, v, kernel, out, opt: FalkonOptions)
Implementation of the KeOps formula to compute a kernel-vector product.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if
T
is very large the operations will be much slower.kernel (falkon.kernels.kernel.Kernel) – Instance of this class. This is equal to
self
and can be ignored.out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (FalkonOptions) – Options to be used for computing the operation. Useful are the memory size options, CUDA options and KeOps options.
- Returns:
out – The computed kernel matrix between
X1
andX2
, multiplied by vectorv
.
Radial kernels
Gaussian kernel
- class falkon.kernels.GaussianKernel(sigma: float | Tensor, opt: FalkonOptions | None = None)
Class for computing the Gaussian kernel and related kernel-vector products
The Gaussian kernel is one of the most common and effective kernel embeddings since it is infinite dimensional, and governed by a single parameter. The kernel length-scale determines the width of the Gaussian distribution which is placed on top of each point. A larger sigma corresponds to a wide Gaussian, so that the relative influence of far away points will be high for computing the kernel at a given datum. On the opposite side of the spectrum, a small sigma means that only nearby points will influence the kernel.
- Parameters:
sigma – The length-scale of the kernel. This can be a scalar, and then it corresponds to the standard deviation of the Gaussian distribution from which the kernel is derived. If sigma is a vector of size d (where d is the dimensionality of the data), it is interpreted as the diagonal standard deviation of the Gaussian distribution. It can also be a matrix of size d*d where d, in which case sigma will be the precision matrix (inverse covariance).
opt – Additional options to be forwarded to the matrix-vector multiplication routines.
Examples
Creating a Gaussian kernel with a single length-scale. Operations on this kernel will not use KeOps.
>>> K = GaussianKernel(sigma=3.0, opt=FalkonOptions(keops_active="no"))
Creating a Gaussian kernel with a different length-scale per dimension
>>> K = GaussianKernel(sigma=torch.tensor([1.0, 3.5, 7.0]))
Creating a Gaussian kernel object with full covariance matrix (randomly chosen)
>>> mat = torch.randn(3, 3, dtype=torch.float64) >>> sym_mat = mat @ mat.T >>> K = GaussianKernel(sigma=sym_mat) >>> K GaussianKernel(sigma=tensor([[ 2.0909, 0.0253, -0.2490], [ 0.0253, 0.3399, -0.5158], [-0.2490, -0.5158, 4.4922]], dtype=torch.float64)) #random
Notes
The Gaussian kernel with a single length-scale follows
\[k(x, x') = \exp{-\dfrac{\lVert x - x' \rVert^2}{2\sigma^2}}\]When the length-scales are specified as a matrix, the RBF kernel is determined by
\[k(x, x') = \exp{-\dfrac{1}{2}x\Sigma x'}\]In both cases, the actual computation follows a different path, working on the expanded norm.
- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
Laplacian kernel
- class falkon.kernels.LaplacianKernel(sigma: float | Tensor, opt: FalkonOptions | None = None)
Class for computing the Laplacian kernel, and related kernel-vector products.
The Laplacian kernel is similar to the Gaussian kernel, but less sensitive to changes in the parameter sigma.
- Parameters:
sigma – The length-scale of the Laplacian kernel
Notes
The Laplacian kernel is determined by the following formula
\[k(x, x') = \exp{-\frac{\lVert x - x' \rVert}{\sigma}}\]- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
Matern kernel
- class falkon.kernels.MaternKernel(sigma: float | Tensor, nu: float | Tensor, opt: FalkonOptions | None = None)
Class for computing the Matern kernel, and related kernel-vector products.
The Matern kernels define a generic class of kernel functions which includes the Laplacian and Gaussian kernels. The class is parametrized by ‘nu’. When
nu = 0.5
this kernel is equivalent to the Laplacian kernel, whennu = float('inf')
, the Matern kernel is equivalent to the Gaussian kernel.This class implements the Matern kernel only for the values of nu which have a closed form solution, which are 0.5, 1.5, 2.5, and infinity.
- Parameters:
sigma – The length-scale of the Matern kernel. The length-scale can be either a scalar or a vector. Matrix-valued length-scales are not allowed for the Matern kernel.
nu – The parameter of the Matern kernel. It should be one of 0.5, 1.5, 2.5 or inf.
Notes
While for nu = float(‘inf’) this kernel is equivalent to the
GaussianKernel
, this implementation is more general. Using theGaussianKernel
directly may be computationally more efficient.- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
Dot-Product kernels
Polynomial kernel
- class falkon.kernels.PolynomialKernel(beta: float | Tensor, gamma: float | Tensor, degree: float | Tensor, opt: FalkonOptions | None = None)
Polynomial kernel with multiplicative and additive constants.
Follows the formula
\[(\gamma * X_1^\top X_2 + \beta)^{\mathrm{degree}}\]Where all operations apart from the matrix multiplication are taken element-wise.
- Parameters:
beta (float-like) – Additive constant
gamma (float-like) – Multiplicative constant
degree (float-like) – Power of the polynomial kernel
opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.
- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
Linear kernel
- class falkon.kernels.LinearKernel(beta: float | Tensor = 0.0, gamma: float | Tensor = 1.0, opt: FalkonOptions | None = None)
Linear Kernel with optional scaling and translation parameters.
The kernel implemented here is the covariance function in the original input space (i.e. X @ X.T) with optional parameters to translate and scale the kernel: beta + gamma * X @ X.T
- Parameters:
beta (float-like) – Additive constant for the kernel, default: 0.0
gamma (float-like) – Multiplicative constant for the kernel. The kernel will be multiplied by the inverse of sigma squared. Default: 1.0
opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.
Examples
>>> k = LinearKernel(beta=0.0, gamma=2.0) >>> X = torch.randn(100, 3) # 100 samples in 3 dimensions >>> kernel_matrix = k(X, X) >>> torch.testing.assert_close(kernel_matrix, X @ X.T * 2)
- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])
Sigmoid kernel
- class falkon.kernels.SigmoidKernel(beta: float | Tensor, gamma: float | Tensor, opt: FalkonOptions | None = None)
Sigmoid (or hyperbolic tangent) kernel function, with additive and multiplicative constants.
Follows the formula
\[k(x, y) = \tanh(\alpha x^\top y + \beta)\]- Parameters:
beta (float-like) – Multiplicative constant
gamma (float-like) – Multiplicative constant
opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.
- __call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute the kernel matrix between
X1
andX2
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set
X2 == X1
to compute a symmetric kernel.diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.
out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The kernel between
X1
andX2
.
- dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute double matrix-vector multiplications where the matrix is the current kernel.
The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (M x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) # N is 100, D is 3 >>> X2 = torch.randn(150, 3) # M is 150 >>> v = torch.randn(150, 1) >>> w = torch.randn(100, 1) >>> out = k.dmmv(X1, X2, v, w, out=None) >>> out.shape torch.Size([150, 1])
- mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)
Compute matrix-vector multiplications where the matrix is the current kernel.
- Parameters:
X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.
X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.
v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.
out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.
opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.
- Returns:
out (torch.Tensor) – The (N x T) output.
Examples
>>> import falkon, torch >>> k = falkon.kernels.GaussianKernel(sigma=2) # You can substitute the Gaussian kernel by any other. >>> X1 = torch.randn(100, 3) >>> X2 = torch.randn(150, 3) >>> v = torch.randn(150, 1) >>> out = k.mmv(X1, X2, v, out=None) >>> out.shape torch.Size([100, 1])