

class falkon.optim.Optimizer

Base class for optimizers. This is an empty shell at the moment.

Conjugate gradient methods


class falkon.optim.ConjugateGradient(opt: ConjugateGradientOptions | None = None)
solve(X0: Tensor | None, B: Tensor, mmv: Callable[[Tensor], Tensor], max_iter: int, callback: Callable[[int, Tensor, float], None] | None = None) Tensor

Conjugate-gradient solver with optional support for preconditioning via generic MMV.

This solver can be used for iterative solution of linear systems of the form $AX = B$ with respect to the X variable. Knowledge of A is only needed through matrix-vector multiplications with temporary solutions (must be provided through the mmv function).

Preconditioning can be achieved by incorporating the preconditioner matrix in the mmv function.

  • X0 (Optional[torch.Tensor]) – Initial solution for the solver. If not provided it will be a zero-tensor.

  • B (torch.Tensor) – Right-hand-side of the linear system to be solved.

  • mmv – User-provided function to perform matrix-vector multiplications with the design matrix A. The function must accept a single argument (the vector to be multiplied), and return the result of the matrix-vector multiplication.

  • max_iter (int) – Maximum number of iterations the solver will perform. Early stopping is implemented via the options passed in the constructor of this class (in particular look at cg_tolerance options) i + 1, X, e_train

  • callback – An optional, user-provided function which shall be called at the end of each iteration with the current solution. The arguments to the function are the iteration number, a tensor containing the current solution, and the total time elapsed from the beginning of training (note that this time explicitly excludes any time taken by the callback itself).


The solution to the linear system X.


class falkon.optim.FalkonConjugateGradient(kernel: Kernel, preconditioner: Preconditioner, opt: FalkonOptions, weight_fn=None)

Preconditioned conjugate gradient solver, optimized for the Falkon algorithm.

The linear system solved is

\[\widetilde{B}^\top H \widetilde{B} \beta = \widetilde{B}^\top K_{nm}^\top Y\]

where \(\widetilde{B}\) is the approximate preconditioner

\[\widetilde{B} = 1/\sqrt{n}T^{-1}A^{-1}\]

\(\beta\) is the preconditioned solution vector (from which we can get \(\alpha = \widetilde{B}\beta\)), and \(H\) is the \(m\times m\) sketched matrix

\[H = K_{nm}^\top K_{nm} + \lambda n K_{mm}\]
  • kernel – The kernel class used for the CG algorithm

  • preconditioner – The approximate Falkon preconditioner. The class should allow triangular solves with both \(T\) and \(A\) and multiple right-hand sides. The preconditioner should already have been initialized with a set of Nystrom centers. If the Nystrom centers used for CG are different from the ones used for the preconditioner, the CG method could converge very slowly.

  • opt – Options passed to the CG solver and to the kernel for computations.

See also


for the preconditioner class which is responsible for computing matrices T and A.