falkon.options¶
FalkonOptions¶

class
falkon.options.
FalkonOptions
(keops_acc_dtype: str = 'auto', keops_sum_scheme: str = 'auto', keops_active: str = 'auto', chol_force_in_core: bool = False, chol_force_ooc: bool = False, chol_par_blk_multiplier: int = 2, lauum_par_blk_multiplier: int = 8, pc_epsilon_32: float = 1e05, pc_epsilon_64: float = 1e13, cpu_preconditioner: bool = False, cg_epsilon_32: float = 1e07, cg_epsilon_64: float = 1e15, cg_tolerance: float = 1e07, cg_full_gradient_every: int = 10, debug: bool = False, use_cpu: bool = False, max_gpu_mem: float = inf, max_cpu_mem: float = inf, compute_arch_speed: bool = False, no_single_kernel: bool = True, min_cuda_pc_size_32: int = 10000, min_cuda_pc_size_64: int = 30000, min_cuda_iter_size_32: int = 300000000, min_cuda_iter_size_64: int = 900000000, never_store_kernel: bool = False, num_fmm_streams: int = 2)¶ Global options for Falkon.
 Parameters
debug (bool) – default False  When set to
True
, the estimators will print extensive debugging information. Set it if you want to dig deeper.use_cpu (bool) – default False  When set to
True
forces Falkon not to use the GPU. If this option is not set, and no GPU is available, Falkon will issue a warning.max_gpu_mem (float) – The maximum GPU memory (in bytes) that Falkon may use. If not set, Falkon will use all available memory.
max_cpu_mem (float) – The maximum CPU RAM (in bytes) that Falkon may use. If not set, Falkon will use all available memory. This option is not a strict bound (due to the nature of memory management in Python).
compute_arch_speed (bool) – default False  When running Falkon on a machine with multiple GPUs which have a range of different performance characteristics, setting this option to True may help subdivide the workload better: the performance of each accelerator will be evaluated on startup, then the faster devices will receive more work than the slower ones. If this is not the case, do not set this option since evaluating accelerator performance increases startup times.
no_single_kernel (bool) – default True  Whether the kernel should always be evaluated in double precision. If set to
False
, kernel evaluations will be faster but less precise (note that this referes only to calculations involving the full kernel matrix, not to kernelvector products).min_cuda_pc_size_32 (int) – default 10000  If M (the number of Nystroem centers) is lower than
min_cuda_pc_size_32
, falkon will run the preconditioner on the CPU. Otherwise, if CUDA is available, falkon will try to run the preconditioner on the GPU. This setting is valid for data in single (float32) precision. Along with themin_cuda_iter_size_32
setting, this determines a cutoff for running Falkon on the CPU or the GPU. Such cutoff is useful since for smalldata problems running on the CPU may be faster than running on the GPU. If your data is close to the cutoff, it may be worth experimenting with running on the CPU and on the GPU to check which side is faster. This will depend on the exact hardware.min_cuda_pc_size_64 (int) – default 30000  If M (the number of Nystroem centers) is lower than
min_cuda_pc_size_64
, falkon will run the preconditioner on the CPU. Otherwise, if CUDA is available, falkon will try to run the preconditioner on the GPU. This setting is valid for data in double (float64) precision. Along with themin_cuda_iter_size_64
setting, this determines a cutoff for running Falkon on the CPU or the GPU. Such cutoff is useful since for smalldata problems running on the CPU may be faster than running on the GPU. If your data is close to the cutoff, it may be worth experimenting with running on the CPU and on the GPU to check which side is faster. This will depend on the exact hardware.min_cuda_iter_size_32 (int) – default 300_000_000  If the data size (measured as the product of M, and the dimensions of X) is lower than
min_cuda_iter_size_32
, falkon will run the conjugate gradient iterations on the CPU. For example, with the default setting, the CPUGPU threshold is set at a dataset with 10k points, 10 dimensions, and 3k Nystroem centers. A larger dataset, or the use of more centers, will cause the conjugate gradient iterations to run on the GPU. This setting is valid for data in single (float32) precision.min_cuda_iter_size_64 (int) – default 900_000_000  If the data size (measured as the product of M, and the dimensions of X) is lower than
min_cuda_iter_size_64
, falkon will run the conjugate gradient iterations on the CPU. For example, with the default setting, the CPUGPU threshold is set at a dataset with 30k points, 10 dimensions, and 3k Nystroem centers. A larger dataset, or the use of more centers, will cause the conjugate gradient iterations to run on the GPU. This setting is valid for data in double (float64) precision.never_store_kernel (bool) – default False  If set to True, the kernel between the data and the Nystroem centers will not be stored  even if there is sufficient RAM to do so. Setting this option to True may (in case there would be enough RAM to store the kernel), increase the training time for Falkon since the K_NM matrix must be recomputed at every conjugate gradient iteration.
num_fmm_streams (int) – default 2  The number of CUDA streams to use for evaluating kernels when CUDA is available. This number should be increased from its default value when the number of Nystroem centers is higher than around 5000.
keops_acc_dtype (str) – default “auto”  A string describing the accumulator datatype for KeOps. For more information refer to the KeOps documentation
keops_sum_scheme (str) – default “auto”  Accumulation scheme for KeOps. For more information refer to the KeOps documentation
keops_active (str) – default “auto”  Whether to use or not to use KeOps. Three settings are allowed, specified by strings: ‘auto’ (the default setting) means that KeOps will be used if it is installed correctly, ‘no’ means keops will not be used, nor will it be imported, and ‘force’ means that if KeOps is not installed an error will be raised.
cg_epsilon_32 (float) – default 1e7  Small added epsilon to prevent dividebyzero errors in the conjugate gradient algorithm. Used for single precision datatypes
cg_epsilon_64 (float) – default 1e15  Small added epsilon to prevent dividebyzero errors in the conjugate gradient algorithm. Used for double precision datatypes
cg_tolerance (float) – default 1e7  Maximum change in model parameters between iterations. If less change than
cg_tolerance
is detected, then we regard the optimization as converged.cg_full_gradient_every (int) – default 10  How often to calculate the full gradient in the conjugate gradient algorithm. Fullgradient iterations take roughly twice the time as normal iterations, but they reset the error introduced by the other iterations.
pc_epsilon_32 (float) – default 1e5  Epsilon used to increase the diagonal dominance of a matrix before its Cholesky decomposition (for singleprecision data types).
pc_epsilon_64 (float) – default 1e13  Epsilon used to increase the diagonal dominance of a matrix before its Cholesky decomposition (for doubleprecision data types).
cpu_preconditioner (bool) – default False  Whether the preconditioner should be computed on the CPU. This setting overrides the
FalkonOptions.use_cpu
option.lauum_par_blk_multiplier (int) – default 8  Minimum number of tiles perGPU for the LAUUM algorithm. This can be set quite high (e.g. 8) without too much performance degradation. Optimal settings will depend on the number of GPUs.
chol_force_in_core (bool) – default False  Whether to force incore execution of the Cholesky decomposition. This will not work with matrices bigger than GPU memory.
chol_force_ooc (bool) – default False  Whether to force outofcore (parallel) execution for the POTRF algorithm, even on matrices which fit inGPUcore.
chol_par_blk_multiplier () – default 2  Minimum number of tiles perGPU in the outofcore, GPUparallel POTRF algorithm.