falkon.ooc_ops
The out-of-core algorithms for the Cholesky decomposition and the LAUUM operation are crucial for speeding up our library. To find out more about how they work, check the source code:
Out of core Cholesky (CUDA code)
Out of core LAUUM (Python code)
The following functions provide a higher-level interface to the two operations.
gpu_cholesky
- falkon.ooc_ops.gpu_cholesky(A: Tensor, upper: bool, clean: bool, overwrite: bool, opt: FalkonOptions) Tensor
- Parameters:
A (torch.Tensor) – 2D positive-definite matrix of size (n x n) that will be factorized as
A = U.T @ U(if upper is True) orA = L @ L.Tif upper is False.upper (bool) – Whether the triangle which should be factorized is the upper or lower of A.
clean (bool) – Whether the “other” triangle of the output matrix (the one that does not contain the factorization) will be filled with zeros or not.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
opt (FalkonOptions) – Options forwarded for block calculation, and other knobs in the out-of-core parallel POTRF implementation. Useful options are the ones defined in
CholeskyOptions.
Notes
The factorization will always be the ‘lower’ version of the factorization which could however end up on the upper-triangular part of the matrix in case A is not Fortran contiguous to begin with.
gpu_lauum
- falkon.ooc_ops.gpu_lauum(A: Tensor, upper: bool, overwrite: bool = True, write_opposite: bool = False, opt: FalkonOptions | None = None)
- Parameters:
A (torch.Tensor) – N-by-N triangular matrix.
upper (bool) – Whether the input matrix is upper or lower triangular.
overwrite (bool) – Whether to overwrite matrix A or to output the result in a new buffer.
write_opposite (bool) – Independently of the
overwriteparameter, whether to write the result of the triangular multiplication on the ‘opposite’ side ofA. For example, ifupper == Trueandoverwrite == False, then the result will be written on the lower triangular part of the input matrixA. While independent, this is mostly useful whenoverwrite == False, since it can effectively avoid allocating a new tensor, and at the same time preserve the original data.opt (FalkonOptions or None) – Options for the LAUUM operation. The only relevant options are the one connected to GPU memory usage.
- Returns:
out (torch.Tensor) – A (N x N) tensor. This will share the same memory as the input tensor
Aifoverwriteis set toTrue, otherwise it will be a newly allocated tensor.