falkon.gsc_losses

Loss

class falkon.gsc_losses.Loss(name: str, kernel: falkon.kernels.kernel.Kernel, opt: Optional[falkon.options.FalkonOptions] = None)

Abstract generalized self-concordant loss function class.

Such loss functions must be three times differentiable; but for the logistic Falkon algorithm only the first two derivatives are used. Subclasses must implement the __call__() method which calculates the loss function given two input vectors (the inputs could also be matrices e.g. for the softmax loss), the df() method which calculates the first derivative of the function and ddf() which calculates the second derivative.

Additionally, this class provides two methods (knmp_grad() and knmp_hess()) which calculate kernel-vector products using the loss derivatives for vectors. These functions are specific to the logistic Falkon algorithm.

Parameters
  • name – A descriptive name for the loss function (e.g. “logistic”, “softmax”)

  • kernel – The kernel function used for training a LogFalkon model

  • opt – Falkon options container. Will be passed to the kernel when computing kernel-vector products.

See also

LogisticLoss

a concrete implementation of this class for the logistic loss.

falkon.models.LogisticFalkon

the logistic Falkon model which uses GSC losses.

abstract __call__(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Abstract method. Should return the loss for predicting y2 with true labels y1.

Parameters
  • y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.

  • y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels.

Returns

torch.Tensor – The loss calculated for the two inputs.

abstract ddf(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Abstract method. Should return the second derivative of the loss wrt y2.

Parameters
  • y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.

  • y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.

Returns

torch.Tensor – The second derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.

abstract df(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Abstract method. Should return the derivative of the loss wrt y2.

Parameters
  • y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.

  • y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.

Returns

torch.Tensor – The derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.

knmp_grad(X: torch.Tensor, Xc: torch.Tensor, Y: torch.Tensor, u: torch.Tensor, opt: Optional[falkon.options.FalkonOptions] = None)Tuple[torch.Tensor, torch.Tensor]

Computes a kernel vector product where the vector is the first derivative of this loss

Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation

\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}'(Y, K(X, X_c) @ u))\]
Parameters
  • X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.

  • Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.

  • Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.

  • u (torch.Tensor) – A vector (or matrix if the labels are multi-dimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.

  • opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.

Returns

  • grad_mul (torch.Tensor) – A tensor of shape (m x 1) coming from the multiplication of the kernel matrix K(Xc, X) and the loss calculated on predictions with weights u. The formula followed is: (1/n) * K(Xc, X) @ df(Y, K(X, Xc) @ u).

  • func_val (torch.Tensor) – A tensor of shape (n x t) of predictions obtained with weights u.

knmp_hess(X: torch.Tensor, Xc: torch.Tensor, Y: torch.Tensor, f: torch.Tensor, u: torch.Tensor, opt: Optional[falkon.options.FalkonOptions] = None)torch.Tensor

Compute a kernel-vector product with a rescaling with the second derivative

Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation

\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}''(Y, f) * K(X, X_c) @ u)\]
Parameters
  • X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.

  • Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.

  • Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.

  • f (torch.Tensor) – Tensor of shape (n x t) of predictions. Typically this will be the second output of the knmp_grad() method.

  • u (torch.Tensor) – A vector (or matrix if the labels are multi-dimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.

  • opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.

Returns

A tensor of shape (m x t), the output of the computation.

Logistic loss

class falkon.gsc_losses.LogisticLoss(kernel: falkon.kernels.kernel.Kernel, opt: Optional[falkon.options.FalkonOptions] = None)

Wrapper for the logistic loss, to be used in conjunction with the LogisticFalkon estimator.

Usage of this loss assumes a binary classification problem with labels -1 and +1. For different choices of labels, see WeightedCrossEntropyLoss.

Parameters

Examples

>>> k = falkon.kernels.GaussianKernel(3)
>>> log_loss = LogisticLoss(k)
>>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=log_loss, M=100)
__call__(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Compute the logistic loss between two 1-dimensional tensors

The formula used is \(\log(1 + \exp(-y_1 * y_2))\)

Parameters
  • y1 – The first input tensor. Must be 1D

  • y2 – The second input tensor. Must be 1D

Returns

loss – The logistic loss between the two input vectors.

ddf(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Compute the second derivative of the logistic loss with respect to y2

The formula used is

\[y_1^2 \dfrac{1}{1 + \exp(-y_1 * y_2)} \dfrac{1}{1 + \exp(y_1 * y_2)}\]
Parameters
  • y1 – The first input tensor. Must be 1D

  • y2 – The second input tensor. Must be 1D

Returns

dd_loss – The second derivative of the logistic loss, calculated between the two input vectors.

df(y1: torch.Tensor, y2: torch.Tensor)torch.Tensor

Compute the derivative of the logistic loss with respect to y2

The formula used is

\[\dfrac{-y_1}{1 + \exp(y_1 * y_2)}\]
Parameters
  • y1 – The first input tensor. Must be 1D

  • y2 – The second input tensor. Must be 1D

Returns

d_loss – The derivative of the logistic loss, calculated between the two input vectors.

Weighted binary cross entropy loss

class falkon.gsc_losses.WeightedCrossEntropyLoss(kernel: falkon.kernels.kernel.Kernel, neg_weight: float, opt: Optional[falkon.options.FalkonOptions] = None)

Wrapper for the weighted binary cross-entropy loss, to be used with the LogisticFalkon estimator.

Using this loss assumes a binary classification problem with labels 0 and +1. Additionally, this loss allows to place a different weight to samples belonging to one of the two classes (see the neg_weight parameter).

Parameters
  • kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a LogisticFalkon model

  • neg_weight (float) – The weight to be assigned to samples belonging to the negative (0-labeled) class. By setting neg_weight to 1, the classes are equally weighted and this loss is equivalent to the LogisticLoss loss, but with a different choice of labels.

  • opt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernel-vector products.

Examples

>>> k = falkon.kernels.GaussianKernel(3)
>>> wce_loss = WeightedCrossEntropyLoss(k)
>>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=wce_loss, M=100)
__call__(true: torch.Tensor, pred: torch.Tensor)torch.Tensor

Compute the weighted BCE loss between two 1-dimensional tensors

The formula used is

\[\mathrm{true} * \log(1 + e^{-\mathrm{pred}}) + w * (1 - \mathrm{true}) * \log(1 + e^{\mathrm{pred}})\]
Parameters
  • true – The label tensor. Must be 1D, with values 0 or 1.

  • pred – The prediction tensor. Must be 1D. These are “logits” so need not be scaled before hand.

Returns

loss – The weighted BCE loss between the two input vectors.

ddf(true: torch.Tensor, pred: torch.Tensor)torch.Tensor

Compute the second derivative of the weighted BCE loss with respect to pred

The formula used is

\[\dfrac{-(\mathrm{true} * (w - 1) - w) * e^{\mathrm{pred}}}{(e^{\mathrm{pred}} + 1)^2}\]
Parameters
  • true – The label tensor. Must be 1D

  • pred – The prediction tensor. Must be 1D

Returns

dd_loss – The second derivative of the weighted BCE loss between the two input vectors.

df(true: torch.Tensor, pred: torch.Tensor)torch.Tensor

Compute the derivative of the weighted BCE loss with respect to pred

The formula used is

\[\dfrac{-(w * \mathrm{true} - w) * e^{\mathrm{pred}} - \mathrm{true}}{e^{\mathrm{pred}} + 1}\]
Parameters
  • true – The label tensor. Must be 1D

  • pred – The prediction tensor. Must be 1D

Returns

d_loss – The derivative of the weighted BCE loss between the two input vectors.