falkon.optim

Optimizer

class falkon.optim.Optimizer: Base class for optimizers. This is an empty shell at the moment.

Conjugate gradient methods

ConjugateGradient

class falkon.optim.ConjugateGradient(opt: ConjugateGradientOptions | None = None)

solve(X0: Tensor | None, B: Tensor, mmv: Callable[[Tensor], Tensor], max_iter: int, callback: Callable[[int, Tensor, float], None] | None = None) → Tensor

Conjugate-gradient solver with optional support for preconditioning via generic MMV.

This solver can be used for iterative solution of linear systems of the form $AX = B$ with respect to the X variable. Knowledge of A is only needed through matrix-vector multiplications with temporary solutions (must be provided through the mmv function).

Preconditioning can be achieved by incorporating the preconditioner matrix in the mmv function.

Parameters:

X0 (Optional[torch.Tensor]) – Initial solution for the solver. If not provided it will be a zero-tensor.
B (torch.Tensor) – Right-hand-side of the linear system to be solved.
mmv – User-provided function to perform matrix-vector multiplications with the design matrix A. The function must accept a single argument (the vector to be multiplied), and return the result of the matrix-vector multiplication.
max_iter (int) – Maximum number of iterations the solver will perform. Early stopping is implemented via the options passed in the constructor of this class (in particular look at cg_tolerance options) i + 1, X, e_train
callback – An optional, user-provided function which shall be called at the end of each iteration with the current solution. The arguments to the function are the iteration number, a tensor containing the current solution, and the total time elapsed from the beginning of training (note that this time explicitly excludes any time taken by the callback itself).

Returns:

The solution to the linear system X.

FalkonConjugateGradient

class falkon.optim.FalkonConjugateGradient(kernel: Kernel, preconditioner: Preconditioner, opt: FalkonOptions, weight_fn=None)

Preconditioned conjugate gradient solver, optimized for the Falkon algorithm.

The linear system solved is

\[\widetilde{B}^\top H \widetilde{B} \beta = \widetilde{B}^\top K_{nm}^\top Y\]

where $\widetilde{B}$ is the approximate preconditioner

\[\widetilde{B} = 1/\sqrt{n}T^{-1}A^{-1}\]

$\beta$ is the preconditioned solution vector (from which we can get $\alpha = \widetilde{B}\beta$), and $H$ is the $m\times m$ sketched matrix

\[H = K_{nm}^\top K_{nm} + \lambda n K_{mm}\]

Parameters:

kernel – The kernel class used for the CG algorithm
preconditioner – The approximate Falkon preconditioner. The class should allow triangular solves with both $T$ and $A$ and multiple right-hand sides. The preconditioner should already have been initialized with a set of Nystrom centers. If the Nystrom centers used for CG are different from the ones used for the preconditioner, the CG method could converge very slowly.
opt – Options passed to the CG solver and to the kernel for computations.