falkon.models

Falkon

class falkon.models.Falkon(kernel: Kernel, penalty: float, M: int, center_selection: str | CenterSelector = 'uniform', maxiter: int = 20, seed: int | None = None, error_fn: Callable[[Tensor, Tensor], float | Tuple[float, str]] | None = None, error_every: int | None = 1, weight_fn: Callable[[Tensor, Tensor, Tensor], Tensor] | None = None, options: FalkonOptions | None = None)

Falkon Kernel Ridge Regression solver.

This estimator object solves approximate kernel ridge regression problems with Nystroem projections and a fast optimization algorithm as described in [1], [2].

Multiclass and multiple regression problems can all be tackled with this same object, for example by encoding multiple classes in a one-hot target matrix.

Parameters:

kernel – Object representing the kernel function used for KRR.
penalty (float) – Amount of regularization to apply to the problem. This parameter must be greater than 0.
M (int) – The number of Nystrom centers to pick. M must be positive, and lower than the total number of training points. A larger M will typically lead to better accuracy but will use more computational resources. You can either specify the number of centers by setting this parameter, or by passing to this constructor a falkon.center_selection.CenterSelector class instance.
center_selection (str or falkon.center_selection.CenterSelector) – The center selection algorithm. Implemented is only ‘uniform’ selection which can choose each training sample with the same probability.
maxiter (int) – The number of iterations to run the optimization for. Usually fewer than 20 iterations are necessary, however this is problem dependent.
seed (int or None) – Random seed. Can be used to make results stable across runs. Randomness is present in the center selection algorithm, and in certain optimizers.
error_fn (Callable or None) – A function with two arguments: targets and predictions, both torch.Tensor objects which returns the error incurred for predicting ‘predictions’ instead of ‘targets’. This is used to display the evolution of the error during the iterations.
error_every (int or None) – Evaluate the error (on training or validation data) every error_every iterations. If set to 1 then the error will be calculated at each iteration. If set to None, it will never be calculated.
weight_fn (Callable or None) –
A function for giving different weights to different samples. This is used for weighted least-squares, it should accept three arguments: Y, X, indices which represent the samples for which weights need to be computed, and return a vector of weights corresponding to the input targets.

As an example, in the setting of binary classification Y can be -1 or +1. To give more importance to errors on the negative class, pass a weight_fn which returns 2 whenever the target is -1.
options (FalkonOptions) – Additional options used by the components of the Falkon solver. Individual options are documented in falkon.options.

Examples

Running Falkon on a random dataset

>>> X = torch.randn(1000, 10)
>>> Y = torch.randn(1000, 1)
>>> kernel = falkon.kernels.GaussianKernel(3.0)
>>> options = FalkonOptions(use_cpu=True)
>>> model = Falkon(kernel=kernel, penalty=1e-6, M=500, options=options)
>>> model.fit(X, Y)
>>> preds = model.predict(X)

Warm restarts: run for 5 iterations, then use warm_start to run for 5 more iterations.

>>> model = Falkon(kernel=kernel, penalty=1e-6, M=500, maxiter=5)
>>> model.fit(X, Y)
>>> model.fit(X, Y, warm_start=model.beta_)

References

Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco, “FALKON: An optimal large scale kernel method,” Advances in Neural Information Processing Systems 29, 2017.
Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advancs in Neural Information Processing Systems, 2020.

fit(X: Tensor, Y: Tensor, Xts: Tensor | None = None, Yts: Tensor | None = None, warm_start: Tensor | None = None)

Fits the Falkon KRR model.

Parameters:

X (torch.Tensor) – The tensor of training data, of shape [num_samples, num_dimensions]. If X is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Y (torch.Tensor) – The tensor of training targets, of shape [num_samples, num_outputs]. If X and Y represent a classification problem, Y can be encoded as a one-hot vector. If Y is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Xts (torch.Tensor or None) – Tensor of validation data, of shape [num_test_samples, num_dimensions]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Xts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Yts (torch.Tensor or None) – Tensor of validation targets, of shape [num_test_samples, num_outputs]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Yts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
warm_start (torch.Tensor or None) – Specify a starting point for the conjugate gradient optimizer. If not specified, the initial point will be a tensor filled with zeros. Be aware that the starting point should not be in the parameter space, but in the preconditioner space (i.e. if initializing from a previous Falkon object, use the beta_ field, not alpha_).

Returns:

model (Falkon) – The fitted model

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params (dict) – Parameter names mapped to their values.

predict(X: Tensor) → Tensor

Makes predictions on data X using the learned model.

Parameters:: X (torch.Tensor) – Tensor of test data points, of shape [num_samples, num_dimensions].
Returns:: predictions (torch.Tensor) – Prediction tensor of shape [num_samples, num_outputs] for all data points.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self (estimator instance) – Estimator instance.

LogisticFalkon

class falkon.models.LogisticFalkon(kernel: Kernel, penalty_list: List[float], iter_list: List[int], loss: Loss, M: int, center_selection: str | CenterSelector = 'uniform', seed: int | None = None, error_fn: Callable[[Tensor, Tensor], float | Tuple[float, str]] | None = None, error_every: int | None = 1, options: FalkonOptions | None = None)

Falkon Logistic regression solver.

This estimator object solves approximate logistic regression problems with Nystroem projections and a fast optimization algorithm as described in [1], [3].

This model can handle logistic regression, so it may be used in place of falkon.models.Falkon (which uses the squared loss) when tackling binary classification problems.

The algorithm works by repeated applications of the base falkon algorithm with decreasing amounts of regularization; therefore the class accepts slightly different parameters from falkon.models.Falkon: a penalty_list which should contain a list of decreasing regularization amounts, and an iter_list which should specify for each application of the base algorithm, how many CG iterations to use. For guidance on how to set these parameters, see below.

Parameters:

kernel – Object representing the kernel function used for KRR.
penalty_list (List[float]) – Amount of regularization to use for each iteration of the base algorithm. The length of this list determines the number of base algorithm iterations.
iter_list (List[int]) – Number of conjugate gradient iterations used in each iteration of the base algorithm. The length of this list must be identical to that of penalty_list.
loss (Loss) – This parameter must be set to an instance of falkon.gsc_losses.LogisticLoss, initialized with the same kernel as this class.
M (int) – The number of Nystrom centers to pick. M must be positive, and lower than the total number of training points. A larger M will typically lead to better accuracy but will use more computational resources.
center_selection (str or falkon.center_selection.CenterSelector) – The center selection algorithm. Implemented is only ‘uniform’ selection which can choose each training sample with the same probability.
seed (int or None) – Random seed. Can be used to make results stable across runs. Randomness is present in the center selection algorithm, and in certain optimizers.
error_fn (Callable or None) – A function with two arguments: targets and predictions, both torch.Tensor objects which returns the error incurred for predicting ‘predictions’ instead of ‘targets’. This is used to display the evolution of the error during the iterations.
error_every (int or None) – Evaluate the error (on training or validation data) every error_every iterations. If set to 1 then the error will be calculated at each iteration. If set to None, it will never be calculated.
options (FalkonOptions) – Additional options used by the components of the Falkon solver. Individual options are documented in falkon.options.

Examples

Running Logistic Falkon on a random dataset

>>> X = torch.randn(1000, 10)
>>> Y = torch.randn(1000, 1)
>>> Y[Y > 0] = 1
>>> Y[Y <= 0] = -1
>>> kernel = falkon.kernels.GaussianKernel(3.0)
>>> options = FalkonOptions()
>>> model = LogisticFalkon(kernel=kernel, penalty_list=[1e-2, 1e-4, 1e-6, 1e-6, 1e-6],
>>>                        iter_list=[3, 3, 3, 8, 8], M=500, options=options)
>>> model.fit(X, Y)
>>> preds = model.predict(X)

References

Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi, “Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses,” NeurIPS 32, 2019.
Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advancs in Neural Information Processing Systems, 2020.

Notes

A rule of thumb for setting the `penalty_list` is to keep in mind the desired final regularization (1e-6 in the example above), and then create a short path of around three iterations where the regularization is decreased down to the desired value. The decrease can be of 10^2 or 10^3 at each step. Then a certain number of iterations at the desired regularization may be necessary to achieve good performance. The iter_list attribute follows a similar reasoning: use 3 inner-steps for the first three iterations where the regularization is decreased, and then switch to a higher number of inner-steps (e.g. 8) for the remaining iterations.

fit(X: Tensor, Y: Tensor, Xts: Tensor | None = None, Yts: Tensor | None = None)

Fits the Falkon Kernel Logistic Regression model.

Parameters:

X (torch.Tensor) – The tensor of training data, of shape [num_samples, num_dimensions]. If X is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Y (torch.Tensor) – The tensor of training targets, of shape [num_samples, num_outputs]. If X and Y represent a classification problem, Y can be encoded as a one-hot vector. If Y is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Xts (torch.Tensor or None) – Tensor of validation data, of shape [num_test_samples, num_dimensions]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Xts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.
Yts (torch.Tensor or None) – Tensor of validation targets, of shape [num_test_samples, num_outputs]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Yts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data.

Returns:

model (LogisticFalkon) – The fitted model

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params (dict) – Parameter names mapped to their values.

predict(X: Tensor) → Tensor

Makes predictions on data X using the learned model.

Parameters:: X (torch.Tensor) – Tensor of test data points, of shape [num_samples, num_dimensions].
Returns:: predictions (torch.Tensor) – Prediction tensor of shape [num_samples, num_outputs] for all data points.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self (estimator instance) – Estimator instance.

InCoreFalkon

class falkon.models.InCoreFalkon(kernel: Kernel, penalty: float, M: int, center_selection: str | CenterSelector = 'uniform', maxiter: int = 20, seed: int | None = None, error_fn: Callable[[Tensor, Tensor], float | Tuple[float, str]] | None = None, error_every: int | None = 1, weight_fn: Callable[[Tensor], Tensor] | None = None, options: FalkonOptions | None = None, N: int | None = None)

In GPU core Falkon Kernel Ridge Regression solver.

This estimator object solves approximate kernel ridge regression problems with Nystroem projections and a fast optimization algorithm as described in [1], [2].

Multiclass and multiple regression problems can all be tackled with this same object, for example by encoding multiple classes in a one-hot target matrix.

Compared to the base falkon.models.Falkon estimator, the InCoreFalkon estimator is designed to work fully within the GPU, performing no data-copies between CPU and GPU. As such, it is more constraining than the base estimator, but has better performance on smaller problems. In particular, the constraints are that:

the input data must be on a single GPU, when calling InCoreFalkon.fit;

the data, preconditioner, kernels, etc. must all fit on the same GPU.

Using multiple GPUs is not possible with this model.

Parameters:

kernel – Object representing the kernel function used for KRR.
penalty (float) – Amount of regularization to apply to the problem. This parameter must be greater than 0.
M (int) – The number of Nystrom centers to pick. M must be positive, and lower than the total number of training points. A larger M will typically lead to better accuracy but will use more computational resources. You can either specify the number of centers by setting this parameter, or by passing to this constructor a CenterSelctor class instance.
center_selection (str or falkon.center_selection.CenterSelector) – The center selection algorithm. Implemented is only ‘uniform’ selection which can choose each training sample with the same probability.
maxiter (int) – The number of iterations to run the optimization for. Usually fewer than 20 iterations are necessary, however this is problem dependent.
seed (int or None) – Random seed. Can be used to make results stable across runs. Randomness is present in the center selection algorithm, and in certain optimizers.
error_fn (Callable or None) – A function with two arguments: targets and predictions, both torch.Tensor objects which returns the error incurred for predicting ‘predictions’ instead of ‘targets’. This is used to display the evolution of the error during the iterations.
error_every (int or None) – Evaluate the error (on training or validation data) every error_every iterations. If set to 1 then the error will be calculated at each iteration. If set to None, it will never be calculated.
weight_fn (Callable or None) –
A function for giving different weights to different samples. This is used for weighted least-squares, it should accept three arguments: Y, X, indices which represent the samples for which weights need to be computed, and return a vector of weights corresponding to the input targets.

As an example, in the setting of binary classification Y can be -1 or +1. To give more importance to errors on the negative class, pass a weight_fn which returns 2 whenever the target is -1.
options (FalkonOptions) – Additional options used by the components of the Falkon solver. Individual options are documented in falkon.options.

Examples

Running InCoreFalkon on a randomly generated dataset

>>> X = torch.randn(1000, 10).cuda()
>>> Y = torch.randn(1000, 1).cuda()
>>> kernel = falkon.kernels.GaussianKernel(3.0)
>>> options = FalkonOptions(use_cpu=True)
>>> model = InCoreFalkon(kernel=kernel, penalty=1e-6, M=500, options=options)
>>> model.fit(X, Y)
>>> preds = model.predict(X)
>>> assert preds.is_cuda

References

Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco, “FALKON: An optimal large scale kernel method,” Advances in Neural Information Processing Systems 29, 2017.
Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advancs in Neural Information Processing Systems, 2020.

fit(X: Tensor, Y: Tensor, Xts: Tensor | None = None, Yts: Tensor | None = None, warm_start: Tensor | None = None)

Fits the Falkon KRR model.

Parameters:

X (torch.Tensor) – The tensor of training data, of shape [num_samples, num_dimensions]. If X is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data. Must be a CUDA tensor.
Y (torch.Tensor) – The tensor of training targets, of shape [num_samples, num_outputs]. If X and Y represent a classification problem, Y can be encoded as a one-hot vector. If Y is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data. Must be a CUDA tensor.
Xts (torch.Tensor or None) – Tensor of validation data, of shape [num_test_samples, num_dimensions]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Xts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data. Must be a CUDA tensor.
Yts (torch.Tensor or None) – Tensor of validation targets, of shape [num_test_samples, num_outputs]. If validation data is provided and error_fn was specified when creating the model, they will be used to print the validation error during the optimization iterations. If Yts is in Fortran order (i.e. column-contiguous) then we can avoid an extra copy of the data. Must be a CUDA tensor.
warm_start (torch.Tensor or None) – Specify a starting point for the conjugate gradient optimizer. If not specified, the initial point will be a tensor filled with zeros. Be aware that the starting point should not be in the parameter space, but in the preconditioner space (i.e. if initializing from a previous Falkon object, use the beta_ field, not alpha_).

Returns:

model (InCoreFalkon) – The fitted model

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params (dict) – Parameter names mapped to their values.

predict(X: Tensor) → Tensor

Makes predictions on data X using the learned model.

Parameters:: X (torch.Tensor) – Tensor of test data points, of shape [num_samples, num_dimensions].
Returns:: predictions (torch.Tensor) – Prediction tensor of shape [num_samples, num_outputs] for all data points.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self (estimator instance) – Estimator instance.