falkon.options

FalkonOptions

class falkon.options.FalkonOptions(keops_acc_dtype: str = 'auto', keops_sum_scheme: str = 'auto', keops_active: str = 'auto', chol_force_in_core: bool = False, chol_force_ooc: bool = False, chol_par_blk_multiplier: int = 2, lauum_par_blk_multiplier: int = 8, pc_epsilon_32: float = 1e-05, pc_epsilon_64: float = 1e-13, cpu_preconditioner: bool = False, cg_epsilon_32: float = 1e-07, cg_epsilon_64: float = 1e-15, cg_tolerance: float = 1e-07, cg_full_gradient_every: int = 10, debug: bool = False, use_cpu: bool = False, max_gpu_mem: float = inf, max_cpu_mem: float = inf, compute_arch_speed: bool = False, no_single_kernel: bool = True, min_cuda_pc_size_32: int = 10000, min_cuda_pc_size_64: int = 30000, min_cuda_iter_size_32: int = 300000000, min_cuda_iter_size_64: int = 900000000, never_store_kernel: bool = False, num_fmm_streams: int = 2)

Global options for Falkon.

Parameters
  • debug (bool) – default False - When set to True, the estimators will print extensive debugging information. Set it if you want to dig deeper.

  • use_cpu (bool) – default False - When set to True forces Falkon not to use the GPU. If this option is not set, and no GPU is available, Falkon will issue a warning.

  • max_gpu_mem (float) – The maximum GPU memory (in bytes) that Falkon may use. If not set, Falkon will use all available memory.

  • max_cpu_mem (float) – The maximum CPU RAM (in bytes) that Falkon may use. If not set, Falkon will use all available memory. This option is not a strict bound (due to the nature of memory management in Python).

  • compute_arch_speed (bool) – default False - When running Falkon on a machine with multiple GPUs which have a range of different performance characteristics, setting this option to True may help subdivide the workload better: the performance of each accelerator will be evaluated on startup, then the faster devices will receive more work than the slower ones. If this is not the case, do not set this option since evaluating accelerator performance increases startup times.

  • no_single_kernel (bool) – default True - Whether the kernel should always be evaluated in double precision. If set to False, kernel evaluations will be faster but less precise (note that this referes only to calculations involving the full kernel matrix, not to kernel-vector products).

  • min_cuda_pc_size_32 (int) – default 10000 - If M (the number of Nystroem centers) is lower than min_cuda_pc_size_32, falkon will run the preconditioner on the CPU. Otherwise, if CUDA is available, falkon will try to run the preconditioner on the GPU. This setting is valid for data in single (float32) precision. Along with the min_cuda_iter_size_32 setting, this determines a cutoff for running Falkon on the CPU or the GPU. Such cutoff is useful since for small-data problems running on the CPU may be faster than running on the GPU. If your data is close to the cutoff, it may be worth experimenting with running on the CPU and on the GPU to check which side is faster. This will depend on the exact hardware.

  • min_cuda_pc_size_64 (int) – default 30000 - If M (the number of Nystroem centers) is lower than min_cuda_pc_size_64, falkon will run the preconditioner on the CPU. Otherwise, if CUDA is available, falkon will try to run the preconditioner on the GPU. This setting is valid for data in double (float64) precision. Along with the min_cuda_iter_size_64 setting, this determines a cutoff for running Falkon on the CPU or the GPU. Such cutoff is useful since for small-data problems running on the CPU may be faster than running on the GPU. If your data is close to the cutoff, it may be worth experimenting with running on the CPU and on the GPU to check which side is faster. This will depend on the exact hardware.

  • min_cuda_iter_size_32 (int) – default 300_000_000 - If the data size (measured as the product of M, and the dimensions of X) is lower than min_cuda_iter_size_32, falkon will run the conjugate gradient iterations on the CPU. For example, with the default setting, the CPU-GPU threshold is set at a dataset with 10k points, 10 dimensions, and 3k Nystroem centers. A larger dataset, or the use of more centers, will cause the conjugate gradient iterations to run on the GPU. This setting is valid for data in single (float32) precision.

  • min_cuda_iter_size_64 (int) – default 900_000_000 - If the data size (measured as the product of M, and the dimensions of X) is lower than min_cuda_iter_size_64, falkon will run the conjugate gradient iterations on the CPU. For example, with the default setting, the CPU-GPU threshold is set at a dataset with 30k points, 10 dimensions, and 3k Nystroem centers. A larger dataset, or the use of more centers, will cause the conjugate gradient iterations to run on the GPU. This setting is valid for data in double (float64) precision.

  • never_store_kernel (bool) – default False - If set to True, the kernel between the data and the Nystroem centers will not be stored - even if there is sufficient RAM to do so. Setting this option to True may (in case there would be enough RAM to store the kernel), increase the training time for Falkon since the K_NM matrix must be recomputed at every conjugate gradient iteration.

  • num_fmm_streams (int) – default 2 - The number of CUDA streams to use for evaluating kernels when CUDA is available. This number should be increased from its default value when the number of Nystroem centers is higher than around 5000.

  • keops_acc_dtype (str) – default “auto” - A string describing the accumulator data-type for KeOps. For more information refer to the KeOps documentation

  • keops_sum_scheme (str) – default “auto” - Accumulation scheme for KeOps. For more information refer to the KeOps documentation

  • keops_active (str) – default “auto” - Whether to use or not to use KeOps. Three settings are allowed, specified by strings: ‘auto’ (the default setting) means that KeOps will be used if it is installed correctly, ‘no’ means keops will not be used, nor will it be imported, and ‘force’ means that if KeOps is not installed an error will be raised.

  • cg_epsilon_32 (float) – default 1e-7 - Small added epsilon to prevent divide-by-zero errors in the conjugate gradient algorithm. Used for single precision data-types

  • cg_epsilon_64 (float) – default 1e-15 - Small added epsilon to prevent divide-by-zero errors in the conjugate gradient algorithm. Used for double precision data-types

  • cg_tolerance (float) – default 1e-7 - Maximum change in model parameters between iterations. If less change than cg_tolerance is detected, then we regard the optimization as converged.

  • cg_full_gradient_every (int) – default 10 - How often to calculate the full gradient in the conjugate gradient algorithm. Full-gradient iterations take roughly twice the time as normal iterations, but they reset the error introduced by the other iterations.

  • pc_epsilon_32 (float) – default 1e-5 - Epsilon used to increase the diagonal dominance of a matrix before its Cholesky decomposition (for single-precision data types).

  • pc_epsilon_64 (float) – default 1e-13 - Epsilon used to increase the diagonal dominance of a matrix before its Cholesky decomposition (for double-precision data types).

  • cpu_preconditioner (bool) – default False - Whether the preconditioner should be computed on the CPU. This setting overrides the FalkonOptions.use_cpu option.

  • lauum_par_blk_multiplier (int) – default 8 - Minimum number of tiles per-GPU for the LAUUM algorithm. This can be set quite high (e.g. 8) without too much performance degradation. Optimal settings will depend on the number of GPUs.

  • chol_force_in_core (bool) – default False - Whether to force in-core execution of the Cholesky decomposition. This will not work with matrices bigger than GPU memory.

  • chol_force_ooc (bool) – default False - Whether to force out-of-core (parallel) execution for the POTRF algorithm, even on matrices which fit in-GPU-core.

  • chol_par_blk_multiplier () – default 2 - Minimum number of tiles per-GPU in the out-of-core, GPU-parallel POTRF algorithm.