falkon.ooc_ops
The out-of-core algorithms for the Cholesky decomposition and the LAUUM operation are crucial for speeding up our library. To find out more about how they work, check the source code:
Out of core Cholesky (CUDA code)
Out of core LAUUM (Python code)
The following functions provide a higher-level interface to the two operations.
gpu_cholesky
- falkon.ooc_ops.gpu_cholesky(A: Tensor, upper: bool, clean: bool, overwrite: bool, opt: FalkonOptions) Tensor
- Parameters:
A (torch.Tensor) – 2D positive-definite matrix of size (n x n) that will be factorized as
A = U.T @ U
(if upper is True) orA = L @ L.T
if upper is False.upper (bool) – Whether the triangle which should be factorized is the upper or lower of A.
clean (bool) – Whether the “other” triangle of the output matrix (the one that does not contain the factorization) will be filled with zeros or not.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
opt (FalkonOptions) – Options forwarded for block calculation, and other knobs in the out-of-core parallel POTRF implementation. Useful options are the ones defined in
CholeskyOptions
.
Notes
The factorization will always be the ‘lower’ version of the factorization which could however end up on the upper-triangular part of the matrix in case A is not Fortran contiguous to begin with.
gpu_lauum
- falkon.ooc_ops.gpu_lauum(A: Tensor, upper: bool, overwrite: bool = True, write_opposite: bool = False, opt: FalkonOptions | None = None)
- Parameters:
A (torch.Tensor) – N-by-N triangular matrix.
upper (bool) – Whether the input matrix is upper or lower triangular.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
write_opposite (bool) – Independently of the
overwrite
parameter, whether to write the result of the triangular multiplication on the ‘opposite’ side ofA
. For example, ifupper == True
andoverwrite == False
, then the result will be written on the lower triangular part of the input matrixA
. While independent, this is mostly useful whenoverwrite == False
, since it can effectively avoid allocating a new tensor, and at the same time preserve the original data.opt (FalkonOptions or None) – Options for the LAUUM operation. The only relevant options are the one connected to GPU memory usage.
- Returns:
out (torch.Tensor) – A (N x N) tensor. This will share the same memory as the input tensor
A
ifoverwrite
is set toTrue
, otherwise it will be a newly allocated tensor.