falkon.ooc_ops

The out-of-core algorithms for the Cholesky decomposition and the LAUUM operation are crucial for speeding up our library. To find out more about how they work, check the source code:

Out of core Cholesky (CUDA code)

Out of core LAUUM (Python code)

The following functions provide a higher-level interface to the two operations.

gpu_cholesky

falkon.ooc_ops.gpu_cholesky(A: Tensor, upper: bool, clean: bool, overwrite: bool, opt: FalkonOptions) → Tensor

Parameters:

A (torch.Tensor) – 2D positive-definite matrix of size (n x n) that will be factorized as A = U.T @ U (if upper is True) or A = L @ L.T if upper is False.
upper (bool) – Whether the triangle which should be factorized is the upper or lower of A.
clean (bool) – Whether the “other” triangle of the output matrix (the one that does not contain the factorization) will be filled with zeros or not.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
opt (FalkonOptions) – Options forwarded for block calculation, and other knobs in the out-of-core parallel POTRF implementation. Useful options are the ones defined in CholeskyOptions .

Notes

The factorization will always be the ‘lower’ version of the factorization which could however end up on the upper-triangular part of the matrix in case A is not Fortran contiguous to begin with.

gpu_lauum

falkon.ooc_ops.gpu_lauum(A: Tensor, upper: bool, overwrite: bool = True, write_opposite: bool = False, opt: FalkonOptions | None = None)

Parameters:

A (torch.Tensor) – N-by-N triangular matrix.
upper (bool) – Whether the input matrix is upper or lower triangular.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
write_opposite (bool) – Independently of the overwrite parameter, whether to write the result of the triangular multiplication on the ‘opposite’ side of A. For example, if upper == True and overwrite == False, then the result will be written on the lower triangular part of the input matrix A. While independent, this is mostly useful when overwrite == False, since it can effectively avoid allocating a new tensor, and at the same time preserve the original data.
opt (FalkonOptions or None) – Options for the LAUUM operation. The only relevant options are the one connected to GPU memory usage.

Returns:

out (torch.Tensor) – A (N x N) tensor. This will share the same memory as the input tensor A if overwrite is set to True, otherwise it will be a newly allocated tensor.