falkon.gsc_losses

Loss

class falkon.gsc_losses.Loss(name: str, kernel: Kernel, opt: FalkonOptions | None = None)

Abstract generalized self-concordant loss function class.

Such loss functions must be three times differentiable; but for the logistic Falkon algorithm only the first two derivatives are used. Subclasses must implement the __call__() method which calculates the loss function given two input vectors (the inputs could also be matrices e.g. for the softmax loss), the df() method which calculates the first derivative of the function and ddf() which calculates the second derivative.

Additionally, this class provides two methods (knmp_grad() and knmp_hess()) which calculate kernel-vector products using the loss derivatives for vectors. These functions are specific to the logistic Falkon algorithm.

Parameters:

name – A descriptive name for the loss function (e.g. “logistic”, “softmax”)
kernel – The kernel function used for training a LogFalkon model
opt – Falkon options container. Will be passed to the kernel when computing kernel-vector products.

Logistic loss

class falkon.gsc_losses.LogisticLoss(kernel: Kernel, opt: FalkonOptions | None = None)

Wrapper for the logistic loss, to be used in conjunction with the LogisticFalkon estimator.

Usage of this loss assumes a binary classification problem with labels -1 and +1. For different choices of labels, see WeightedCrossEntropyLoss.

Parameters:

kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a LogisticFalkon model
opt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernel-vector products.

Examples

>>> k = falkon.kernels.GaussianKernel(3)
>>> log_loss = LogisticLoss(k)
>>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=log_loss, M=100)

__call__(y1: Tensor, y2: Tensor) → Tensor

Compute the logistic loss between two 1-dimensional tensors

The formula used is \(\log(1 + \exp(-y_1 * y_2))\)

Parameters:

y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D

Returns:

loss – The logistic loss between the two input vectors.

ddf(y1: Tensor, y2: Tensor) → Tensor

Compute the second derivative of the logistic loss with respect to y2

The formula used is

\[y_1^2 \dfrac{1}{1 + \exp(-y_1 * y_2)} \dfrac{1}{1 + \exp(y_1 * y_2)}\]

Parameters:

y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D

Returns:

dd_loss – The second derivative of the logistic loss, calculated between the two input vectors.

df(y1: Tensor, y2: Tensor) → Tensor

Compute the derivative of the logistic loss with respect to y2

The formula used is

\[\dfrac{-y_1}{1 + \exp(y_1 * y_2)}\]

Parameters:

y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D

Returns:

d_loss – The derivative of the logistic loss, calculated between the two input vectors.

Weighted binary cross entropy loss

class falkon.gsc_losses.WeightedCrossEntropyLoss(kernel: Kernel, neg_weight: float, opt: FalkonOptions | None = None)

Wrapper for the weighted binary cross-entropy loss, to be used with the LogisticFalkon estimator.

Using this loss assumes a binary classification problem with labels 0 and +1. Additionally, this loss allows to place a different weight to samples belonging to one of the two classes (see the neg_weight parameter).

Parameters:

kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a LogisticFalkon model
neg_weight (float) – The weight to be assigned to samples belonging to the negative (0-labeled) class. By setting neg_weight to 1, the classes are equally weighted and this loss is equivalent to the LogisticLoss loss, but with a different choice of labels.
opt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernel-vector products.

Examples

>>> k = falkon.kernels.GaussianKernel(3)
>>> wce_loss = WeightedCrossEntropyLoss(k)
>>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=wce_loss, M=100)

__call__(true: Tensor, pred: Tensor) → Tensor

Compute the weighted BCE loss between two 1-dimensional tensors

The formula used is

\[\mathrm{true} * \log(1 + e^{-\mathrm{pred}}) + w * (1 - \mathrm{true}) * \log(1 + e^{\mathrm{pred}})\]

Parameters:

true – The label tensor. Must be 1D, with values 0 or 1.
pred – The prediction tensor. Must be 1D. These are “logits” so need not be scaled before hand.

Returns:

loss – The weighted BCE loss between the two input vectors.

ddf(true: Tensor, pred: Tensor) → Tensor

Compute the second derivative of the weighted BCE loss with respect to pred

The formula used is

\[\dfrac{-(\mathrm{true} * (w - 1) - w) * e^{\mathrm{pred}}}{(e^{\mathrm{pred}} + 1)^2}\]

Parameters:

true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D

Returns:

dd_loss – The second derivative of the weighted BCE loss between the two input vectors.

df(true: Tensor, pred: Tensor) → Tensor

Compute the derivative of the weighted BCE loss with respect to pred

The formula used is

\[\dfrac{-(w * \mathrm{true} - w) * e^{\mathrm{pred}} - \mathrm{true}}{e^{\mathrm{pred}} + 1}\]

Parameters:

true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D

Returns:

d_loss – The derivative of the weighted BCE loss between the two input vectors.