falkon.gsc_losses
Loss
- class falkon.gsc_losses.Loss(name: str, kernel: Kernel, opt: FalkonOptions | None = None)
Abstract generalized self-concordant loss function class.
Such loss functions must be three times differentiable; but for the logistic Falkon algorithm only the first two derivatives are used. Subclasses must implement the
__call__()
method which calculates the loss function given two input vectors (the inputs could also be matrices e.g. for the softmax loss), thedf()
method which calculates the first derivative of the function andddf()
which calculates the second derivative.Additionally, this class provides two methods (
knmp_grad()
andknmp_hess()
) which calculate kernel-vector products using the loss derivatives for vectors. These functions are specific to the logistic Falkon algorithm.- Parameters:
name – A descriptive name for the loss function (e.g. “logistic”, “softmax”)
kernel – The kernel function used for training a LogFalkon model
opt – Falkon options container. Will be passed to the kernel when computing kernel-vector products.
See also
LogisticLoss
a concrete implementation of this class for the logistic loss.
falkon.models.LogisticFalkon
the logistic Falkon model which uses GSC losses.
- abstract __call__(y1: Tensor, y2: Tensor) Tensor
Abstract method. Should return the loss for predicting y2 with true labels y1.
- Parameters:
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels.
- Returns:
torch.Tensor – The loss calculated for the two inputs.
- abstract ddf(y1: Tensor, y2: Tensor) Tensor
Abstract method. Should return the second derivative of the loss wrt y2.
- Parameters:
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.
- Returns:
torch.Tensor – The second derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.
- abstract df(y1: Tensor, y2: Tensor) Tensor
Abstract method. Should return the derivative of the loss wrt y2.
- Parameters:
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.
- Returns:
torch.Tensor – The derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.
- knmp_grad(X: Tensor, Xc: Tensor, Y: Tensor, u: Tensor, opt: FalkonOptions | None = None) Tuple[Tensor, Tensor]
Computes a kernel vector product where the vector is the first derivative of this loss
Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation
\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}'(Y, K(X, X_c) @ u))\]- Parameters:
X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.
Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.
Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.
u (torch.Tensor) – A vector (or matrix if the labels are multi-dimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.
opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.
- Returns:
grad_mul (torch.Tensor) – A tensor of shape (m x 1) coming from the multiplication of the kernel matrix K(Xc, X) and the loss calculated on predictions with weights u. The formula followed is: (1/n) * K(Xc, X) @ df(Y, K(X, Xc) @ u).
func_val (torch.Tensor) – A tensor of shape (n x t) of predictions obtained with weights u.
- knmp_hess(X: Tensor, Xc: Tensor, Y: Tensor, f: Tensor, u: Tensor, opt: FalkonOptions | None = None) Tensor
Compute a kernel-vector product with a rescaling with the second derivative
Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation
\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}''(Y, f) * K(X, X_c) @ u)\]- Parameters:
X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.
Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.
Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.
f (torch.Tensor) – Tensor of shape (n x t) of predictions. Typically this will be the second output of the
knmp_grad()
method.u (torch.Tensor) – A vector (or matrix if the labels are multi-dimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.
opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.
- Returns:
A tensor of shape (m x t), the output of the computation.
Logistic loss
- class falkon.gsc_losses.LogisticLoss(kernel: Kernel, opt: FalkonOptions | None = None)
Wrapper for the logistic loss, to be used in conjunction with the
LogisticFalkon
estimator.Usage of this loss assumes a binary classification problem with labels -1 and +1. For different choices of labels, see
WeightedCrossEntropyLoss
.- Parameters:
kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a
LogisticFalkon
modelopt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernel-vector products.
Examples
>>> k = falkon.kernels.GaussianKernel(3) >>> log_loss = LogisticLoss(k) >>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=log_loss, M=100)
- __call__(y1: Tensor, y2: Tensor) Tensor
Compute the logistic loss between two 1-dimensional tensors
The formula used is \(\log(1 + \exp(-y_1 * y_2))\)
- Parameters:
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
- Returns:
loss – The logistic loss between the two input vectors.
- ddf(y1: Tensor, y2: Tensor) Tensor
Compute the second derivative of the logistic loss with respect to y2
The formula used is
\[y_1^2 \dfrac{1}{1 + \exp(-y_1 * y_2)} \dfrac{1}{1 + \exp(y_1 * y_2)}\]- Parameters:
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
- Returns:
dd_loss – The second derivative of the logistic loss, calculated between the two input vectors.
- df(y1: Tensor, y2: Tensor) Tensor
Compute the derivative of the logistic loss with respect to y2
The formula used is
\[\dfrac{-y_1}{1 + \exp(y_1 * y_2)}\]- Parameters:
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
- Returns:
d_loss – The derivative of the logistic loss, calculated between the two input vectors.
Weighted binary cross entropy loss
- class falkon.gsc_losses.WeightedCrossEntropyLoss(kernel: Kernel, neg_weight: float, opt: FalkonOptions | None = None)
Wrapper for the weighted binary cross-entropy loss, to be used with the
LogisticFalkon
estimator.Using this loss assumes a binary classification problem with labels 0 and +1. Additionally, this loss allows to place a different weight to samples belonging to one of the two classes (see the neg_weight parameter).
- Parameters:
kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a
LogisticFalkon
modelneg_weight (float) – The weight to be assigned to samples belonging to the negative (0-labeled) class. By setting neg_weight to 1, the classes are equally weighted and this loss is equivalent to the
LogisticLoss
loss, but with a different choice of labels.opt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernel-vector products.
Examples
>>> k = falkon.kernels.GaussianKernel(3) >>> wce_loss = WeightedCrossEntropyLoss(k) >>> estimator = falkon.LogisticFalkon(k, [1e-4, 1e-4, 1e-4], [3, 3, 3], loss=wce_loss, M=100)
- __call__(true: Tensor, pred: Tensor) Tensor
Compute the weighted BCE loss between two 1-dimensional tensors
The formula used is
\[\mathrm{true} * \log(1 + e^{-\mathrm{pred}}) + w * (1 - \mathrm{true}) * \log(1 + e^{\mathrm{pred}})\]- Parameters:
true – The label tensor. Must be 1D, with values 0 or 1.
pred – The prediction tensor. Must be 1D. These are “logits” so need not be scaled before hand.
- Returns:
loss – The weighted BCE loss between the two input vectors.
- ddf(true: Tensor, pred: Tensor) Tensor
Compute the second derivative of the weighted BCE loss with respect to pred
The formula used is
\[\dfrac{-(\mathrm{true} * (w - 1) - w) * e^{\mathrm{pred}}}{(e^{\mathrm{pred}} + 1)^2}\]- Parameters:
true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D
- Returns:
dd_loss – The second derivative of the weighted BCE loss between the two input vectors.
- df(true: Tensor, pred: Tensor) Tensor
Compute the derivative of the weighted BCE loss with respect to pred
The formula used is
\[\dfrac{-(w * \mathrm{true} - w) * e^{\mathrm{pred}} - \mathrm{true}}{e^{\mathrm{pred}} + 1}\]- Parameters:
true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D
- Returns:
d_loss – The derivative of the weighted BCE loss between the two input vectors.