falkon.kernels

Kernel

class falkon.kernels.kernel.Kernel(name: str, opt: FalkonOptions | None)

Abstract kernel class. Kernels should inherit from this class, overriding appropriate methods.

To extend Falkon with new kernels, you should read the documentation of this class carefully, and take a look at the existing implementation of GaussianKernel or LinearKernel. A walk-through for implementing a custom kernel is available as a notebook.

There are several abstract methods which should be implemented, depending on the kind of operations which are supported by the implementing kernel.

The compute() method should compute the kernel matrix, without concerns for differentiability, while the compute_sparse() method is used to compute the kernel for sparse input matrices.

Parameters:
  • name – A short name for the kernel (e.g. “Gaussian”)

  • opt – Base set of options to be used for operations involving this kernel.

__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

_decide_dmmv_impl(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, opt: FalkonOptions)

Choose which dmmv function to use for this data.

Note that dmmv functions compute double kernel-vector products (see dmmv() for an explanation of what they are).

Parameters:
  • X1 (torch.Tensor) – First data matrix, of shape (N x D)

  • X2 (torch.Tensor) – Second data matrix, of shape (M x D)

  • v (torch.Tensor or None) – Vector for the matrix-vector multiplication (M x T)

  • w (torch.Tensor or None) – Vector for the matrix-vector multiplicatoin (N x T)

  • opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.

Returns:

dmmv_fn – A function which allows to perform the mmv operation.

Notes

This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.

_decide_mm_impl(X1: Tensor, X2: Tensor, diag: bool, opt: FalkonOptions)

Choose which mm function to use for this data.

Note that mm functions compute the kernel itself so KeOps may not be used.

Parameters:
  • X1 (torch.Tensor) – First data matrix, of shape (N x D)

  • X2 (torch.Tensor) – Second data matrix, of shape (M x D)

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.

Returns:

mm_fn – A function which allows to perform the mm operation.

Notes

This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.

_decide_mmv_impl(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, opt: FalkonOptions)

Choose which mmv function to use for this data.

Note that mmv functions compute the kernel-vector product

Parameters:
  • X1 (torch.Tensor) – First data matrix, of shape (N x D)

  • X2 (torch.Tensor) – Second data matrix, of shape (M x D)

  • v (torch.Tensor) – Vector for the matrix-vector multiplication (M x T)

  • opt (FalkonOptions) – Falkon options. Options may be specified to force GPU or CPU usage.

Returns:

mmv_fn – A function which allows to perform the mmv operation.

Notes

This function decides based on the inputs: if the inputs are sparse, it will choose the sparse implementations; if CUDA is detected, it will choose the CUDA implementation; otherwise it will simply choose the basic CPU implementation.

abstract compute(X1: Tensor, X2: Tensor, out: Tensor, diag: bool) Tensor

Compute the kernel matrix of X1 and X2 - without regards for differentiability.

The kernel matrix should be stored in out to ensure the correctness of allocatable memory computations.

Parameters:
  • X1 (torch.Tensor) – The left matrix for computing the kernel

  • X2 (torch.Tensor) – The right matrix for computing the kernel

  • out (torch.Tensor) – The output matrix into which implementing classes should store the kernel.

  • diag (bool) – If true, X1 and X2 have the same shape, and only the diagonal of k(X1, X2) is to be computed and stored in out. Otherwise compute the full kernel matrix.

Returns:

out (torch.Tensor) – The kernel matrix. Should use the same underlying storage as the parameter out.

abstract compute_sparse(X1: SparseTensor, X2: SparseTensor, out: Tensor, diag: bool, **kwargs) Tensor

Compute the kernel matrix of X1 and X2 which are two sparse matrices, storing the output in the dense matrix out.

Parameters:
  • X1 (SparseTensor) – The left matrix for computing the kernel

  • X2 (SparseTensor) – The right matrix for computing the kernel

  • out (torch.Tensor) – The output matrix into which implementing classes should store the kernel.

  • diag (bool) – If true, X1 and X2 have the same shape, and only the diagonal of k(X1, X2) is to be computed and stored in out.

  • kwargs

    Additional keyword arguments which some sparse implementations might require. Currently the keyword arguments passed by the falkon.mmv_ops.fmmv.sparse_mmv_run_thread() and falkon.mmv_ops.fmm.sparse_mm_run_thread() functions are:

    • X1_csr : the X1 matrix in CSR format

    • X2_csr : the X2 matrix in CSR format

Returns:

out (torch.Tensor) – The kernel matrix. Should use the same underlying storage as the parameter out.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
extra_mem(is_differentiable, is_sparse, dtype, density1=None, density2=None) Dict[str, float]

Compute the amount of extra memory which will be needed when computing this kernel.

Often kernel computation needs some extra memory allocations. To avoid using too large block-sizes which may lead to OOM errors, you should declare any such extra allocations for your kernel here.

Indicate extra allocations as coefficients on the required dimensions. For example, if computing a kernel needs to re-allocate the data-matrix (which is of size n * d), the return dictionary will be: {‘nd’: 1}. Other possible coefficients are on d, n, m which are respectively the data-dimension, the number of data-points in the first data matrix and the number of data-points in the second matrix. Pairwise combinations of the three dimensions are possible (i.e. nd, nm, md), and a special key ‘0’ can be used to specify a base memory needed independently of data dimensions. Make sure to specify the dictionary keys as is written here since they will not be recognized otherwise.

Parameters:
  • is_differentiable (bool) –

  • is_sparse (bool) –

  • dtype (torch.dtype or np.dtype) –

  • density1 (float or None) –

  • density2 (float or None) –

Returns:

extra_allocs (dictionary) – A dictionary from strings indicating on which dimensions the extra-allocation is needed (allowed strings: ‘n’, ‘m’, ‘d’, ‘nm’, ‘nd’, ‘md’, ‘0’) to floating-point numbers indicating how many extra-allocations are needed.

mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

DiffKernel

class falkon.kernels.diff_kernel.DiffKernel(name, options, core_fn, **kernel_params)

Abstract class for differentiable kernels.

This class should be extended instead of Kernel whenever designing a custom kernel to be used with automatic hyperparameter optimization (see the hopt module).

Subclasses should implement the detach() method to return a new instance of the kernel, with its parameters detached from the computation graph.

The compute_diff() method should be overridden, unless the core_fn parameter is passed to the constructor.

Hyperparameters to the concrete kernel instance (for example the length-scale of the Gaussian kernel) should be passed to the constructor of this class, in order to be registered as parameters of the computation graph. Even non-differentiable parameters should be provided as keywords (also non tensor arguments).

Parameters:
  • name – A short name for the kernel (e.g. “Gaussian”)

  • options – Base set of options to be used for operations involving this kernel.

  • core_fn – Optional function which can be used to compute a kernel matrix. The signature of the function should be: core_fn(X1, X2, out, diag, **kernel_parameters)` where X1 and X2 are the input matrices, out is the output matrix (it will be None when called from compute_diff()), diag is a flag indicating that only the diagonal of the kernel matrix is to be computed, and **kernel_parameters includes all additional parameters belonging to the kernel (which are passed to the constructor of DiffKernel).

  • kernel_params – All parameters (differentiable and non-differentiable) to be used for this kernel. The values given are used to initialize the actual parameters - which will be copied in the constructor.

compute_diff(X1: Tensor, X2: Tensor, diag: bool)

Compute the kernel matrix of X1 and X2. The output should be differentiable with respect to X1, X2, and all kernel parameters returned by the diff_params() method.

Parameters:
  • X1 (torch.Tensor) – The left matrix for computing the kernel

  • X2 (torch.Tensor) – The right matrix for computing the kernel

  • diag (bool) – If true, X1 and X2 have the same shape, and only the diagonal of k(X1, X2) is to be computed and stored in out. Otherwise compute the full kernel matrix.

Returns:

out (torch.Tensor) – The constructed kernel matrix.

abstract detach() Kernel

Detaches all differentiable parameters of the kernel from the computation graph.

Returns:

k – A new instance of the kernel sharing the same parameters, but detached from the computation graph.

property diff_params: Dict[str, Tensor]

A dictionary mapping parameter names to their values for all differentiable parameters of the kernel.

Returns:

params – A dictionary mapping parameter names to their values

KeopsKernelMixin

class falkon.kernels.keops_helpers.KeopsKernelMixin(name: str, opt: FalkonOptions | None)

Abstract class for kernels which enables KeOps acceleration.

This class should be extended when KeOps-accelerated kernel-vector products are required. Subclasses should implement the keops_mmv_impl() method by defining an appropriate KeOps formula, and calling into KeOps for kernel vector product calculation (the helper method keops_mmv() can be used to call into KeOps).

For help in implementing a subclass, check the existing implementations (e.g. GaussianKernel), and the Custom Kernels notebook.

keops_mmv(X1: Tensor, X2: Tensor, v: Tensor, out: Tensor | None, formula: str, aliases: List[str], other_vars: List[Tensor], opt: FalkonOptions)

Helper method to call into KeOps for kernel-vector products

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • formula (str) – The KeOps formula

  • aliases – Aliases referencing names in the formula with actual KeOps variables

  • other_vars – Kernel parameters to be used in the formula, other than X1, X2 and v.

  • opt – Options to be used for computing the operation. Useful are the memory size options, CUDA options and KeOps options.

Returns:

out – The computed kernel matrix between X1 and X2, multiplied by vector v.

abstract keops_mmv_impl(X1, X2, v, kernel, out, opt: FalkonOptions)

Implementation of the KeOps formula to compute a kernel-vector product.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • kernel (falkon.kernels.kernel.Kernel) – Instance of this class. This is equal to self and can be ignored.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (FalkonOptions) – Options to be used for computing the operation. Useful are the memory size options, CUDA options and KeOps options.

Returns:

out – The computed kernel matrix between X1 and X2, multiplied by vector v.

Radial kernels

Gaussian kernel

class falkon.kernels.GaussianKernel(sigma: float | Tensor, opt: FalkonOptions | None = None)

Class for computing the Gaussian kernel and related kernel-vector products

The Gaussian kernel is one of the most common and effective kernel embeddings since it is infinite dimensional, and governed by a single parameter. The kernel length-scale determines the width of the Gaussian distribution which is placed on top of each point. A larger sigma corresponds to a wide Gaussian, so that the relative influence of far away points will be high for computing the kernel at a given datum. On the opposite side of the spectrum, a small sigma means that only nearby points will influence the kernel.

Parameters:
  • sigma – The length-scale of the kernel. This can be a scalar, and then it corresponds to the standard deviation of the Gaussian distribution from which the kernel is derived. If sigma is a vector of size d (where d is the dimensionality of the data), it is interpreted as the diagonal standard deviation of the Gaussian distribution. It can also be a matrix of size d*d where d, in which case sigma will be the precision matrix (inverse covariance).

  • opt – Additional options to be forwarded to the matrix-vector multiplication routines.

Examples

Creating a Gaussian kernel with a single length-scale. Operations on this kernel will not use KeOps.

>>> K = GaussianKernel(sigma=3.0, opt=FalkonOptions(keops_active="no"))

Creating a Gaussian kernel with a different length-scale per dimension

>>> K = GaussianKernel(sigma=torch.tensor([1.0, 3.5, 7.0]))

Creating a Gaussian kernel object with full covariance matrix (randomly chosen)

>>> mat = torch.randn(3, 3, dtype=torch.float64)
>>> sym_mat = mat @ mat.T
>>> K = GaussianKernel(sigma=sym_mat)
>>> K
GaussianKernel(sigma=tensor([[ 2.0909,  0.0253, -0.2490],
        [ 0.0253,  0.3399, -0.5158],
        [-0.2490, -0.5158,  4.4922]], dtype=torch.float64))  #random

Notes

The Gaussian kernel with a single length-scale follows

\[k(x, x') = \exp{-\dfrac{\lVert x - x' \rVert^2}{2\sigma^2}}\]

When the length-scales are specified as a matrix, the RBF kernel is determined by

\[k(x, x') = \exp{-\dfrac{1}{2}x\Sigma x'}\]

In both cases, the actual computation follows a different path, working on the expanded norm.

__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

Laplacian kernel

class falkon.kernels.LaplacianKernel(sigma: float | Tensor, opt: FalkonOptions | None = None)

Class for computing the Laplacian kernel, and related kernel-vector products.

The Laplacian kernel is similar to the Gaussian kernel, but less sensitive to changes in the parameter sigma.

Parameters:

sigma – The length-scale of the Laplacian kernel

Notes

The Laplacian kernel is determined by the following formula

\[k(x, x') = \exp{-\frac{\lVert x - x' \rVert}{\sigma}}\]
__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

Matern kernel

class falkon.kernels.MaternKernel(sigma: float | Tensor, nu: float | Tensor, opt: FalkonOptions | None = None)

Class for computing the Matern kernel, and related kernel-vector products.

The Matern kernels define a generic class of kernel functions which includes the Laplacian and Gaussian kernels. The class is parametrized by ‘nu’. When nu = 0.5 this kernel is equivalent to the Laplacian kernel, when nu = float('inf'), the Matern kernel is equivalent to the Gaussian kernel.

This class implements the Matern kernel only for the values of nu which have a closed form solution, which are 0.5, 1.5, 2.5, and infinity.

Parameters:
  • sigma – The length-scale of the Matern kernel. The length-scale can be either a scalar or a vector. Matrix-valued length-scales are not allowed for the Matern kernel.

  • nu – The parameter of the Matern kernel. It should be one of 0.5, 1.5, 2.5 or inf.

Notes

While for nu = float(‘inf’) this kernel is equivalent to the GaussianKernel, this implementation is more general. Using the GaussianKernel directly may be computationally more efficient.

__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

Dot-Product kernels

Polynomial kernel

class falkon.kernels.PolynomialKernel(beta: float | Tensor, gamma: float | Tensor, degree: float | Tensor, opt: FalkonOptions | None = None)

Polynomial kernel with multiplicative and additive constants.

Follows the formula

\[(\gamma * X_1^\top X_2 + \beta)^{\mathrm{degree}}\]

Where all operations apart from the matrix multiplication are taken element-wise.

Parameters:
  • beta (float-like) – Additive constant

  • gamma (float-like) – Multiplicative constant

  • degree (float-like) – Power of the polynomial kernel

  • opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.

__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

Linear kernel

class falkon.kernels.LinearKernel(beta: float | Tensor = 0.0, gamma: float | Tensor = 1.0, opt: FalkonOptions | None = None)

Linear Kernel with optional scaling and translation parameters.

The kernel implemented here is the covariance function in the original input space (i.e. X @ X.T) with optional parameters to translate and scale the kernel: beta + gamma * X @ X.T

Parameters:
  • beta (float-like) – Additive constant for the kernel, default: 0.0

  • gamma (float-like) – Multiplicative constant for the kernel. The kernel will be multiplied by the inverse of sigma squared. Default: 1.0

  • opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.

Examples

>>> k = LinearKernel(beta=0.0, gamma=2.0)
>>> X = torch.randn(100, 3)  # 100 samples in 3 dimensions
>>> kernel_matrix = k(X, X)
>>> torch.testing.assert_close(kernel_matrix, X @ X.T * 2)
__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])

Sigmoid kernel

class falkon.kernels.SigmoidKernel(beta: float | Tensor, gamma: float | Tensor, opt: FalkonOptions | None = None)

Sigmoid (or hyperbolic tangent) kernel function, with additive and multiplicative constants.

Follows the formula

\[k(x, y) = \tanh(\alpha x^\top y + \beta)\]
Parameters:
  • beta (float-like) – Multiplicative constant

  • gamma (float-like) – Multiplicative constant

  • opt (Optional[FalkonOptions]) – Options which will be used in downstream kernel operations.

__call__(X1: Tensor, X2: Tensor, diag: bool = False, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute the kernel matrix between X1 and X2

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • diag (bool) – Whether to compute just the diagonal of the kernel matrix, or the whole matrix.

  • out (torch.Tensor or None) – Optional tensor of shape (N x M) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The kernel between X1 and X2.

dmmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor | None, w: Tensor | None, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute double matrix-vector multiplications where the matrix is the current kernel.

The general form of dmmv operations is: Kernel(X2, X1) @ (Kernel(X1, X2) @ v + w) where if v is None, then we simply have Kernel(X2, X1) @ w and if w is None we remove the additive factor. At least one of `w` and `v` must be provided.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor or None) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • w (torch.Tensor or None) – A vector to compute matrix-vector products. This may also be a matrix of shape (N x T) but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (M x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (M x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)  # N is 100, D is 3
>>> X2 = torch.randn(150, 3)  # M is 150
>>> v = torch.randn(150, 1)
>>> w = torch.randn(100, 1)
>>> out = k.dmmv(X1, X2, v, w, out=None)
>>> out.shape
torch.Size([150, 1])
mmv(X1: Tensor | SparseTensor, X2: Tensor | SparseTensor, v: Tensor, out: Tensor | None = None, opt: FalkonOptions | None = None)

Compute matrix-vector multiplications where the matrix is the current kernel.

Parameters:
  • X1 (torch.Tensor) – The first data-matrix for computing the kernel. Of shape (N x D): N samples in D dimensions.

  • X2 (torch.Tensor) – The second data-matrix for computing the kernel. Of shape (M x D): M samples in D dimensions. Set X2 == X1 to compute a symmetric kernel.

  • v (torch.Tensor) – A vector to compute the matrix-vector product. This may also be a matrix of shape (M x T), but if T is very large the operations will be much slower.

  • out (torch.Tensor or None) – Optional tensor of shape (N x T) to hold the output. If not provided it will be created.

  • opt (Optional[FalkonOptions]) – Options to be used for computing the operation. Useful are the memory size options and CUDA options.

Returns:

out (torch.Tensor) – The (N x T) output.

Examples

>>> import falkon, torch
>>> k = falkon.kernels.GaussianKernel(sigma=2)  # You can substitute the Gaussian kernel by any other.
>>> X1 = torch.randn(100, 3)
>>> X2 = torch.randn(150, 3)
>>> v = torch.randn(150, 1)
>>> out = k.mmv(X1, X2, v, out=None)
>>> out.shape
torch.Size([100, 1])