falkon.gsc_losses¶
Loss¶

class
falkon.gsc_losses.
Loss
(name: str, kernel: falkon.kernels.kernel.Kernel, opt: Optional[falkon.options.FalkonOptions] = None)¶ Abstract generalized selfconcordant loss function class.
Such loss functions must be three times differentiable; but for the logistic Falkon algorithm only the first two derivatives are used. Subclasses must implement the
__call__()
method which calculates the loss function given two input vectors (the inputs could also be matrices e.g. for the softmax loss), thedf()
method which calculates the first derivative of the function andddf()
which calculates the second derivative.Additionally, this class provides two methods (
knmp_grad()
andknmp_hess()
) which calculate kernelvector products using the loss derivatives for vectors. These functions are specific to the logistic Falkon algorithm. Parameters
name – A descriptive name for the loss function (e.g. “logistic”, “softmax”)
kernel – The kernel function used for training a LogFalkon model
opt – Falkon options container. Will be passed to the kernel when computing kernelvector products.
See also
LogisticLoss
a concrete implementation of this class for the logistic loss.
falkon.models.LogisticFalkon
the logistic Falkon model which uses GSC losses.

abstract
__call__
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Abstract method. Should return the loss for predicting y2 with true labels y1.
 Parameters
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels.
 Returns
torch.Tensor – The loss calculated for the two inputs.

abstract
ddf
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Abstract method. Should return the second derivative of the loss wrt y2.
 Parameters
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.
 Returns
torch.Tensor – The second derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.

abstract
df
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Abstract method. Should return the derivative of the loss wrt y2.
 Parameters
y1 (torch.Tensor) – One of the two inputs to the loss. This should be interpreted as the true labels.
y2 (torch.Tensor) – The other loss input. Should be interpreted as the predicted labels. The derivative should be computed with respect to this tensor.
 Returns
torch.Tensor – The derivative of the loss with respect to y2. It will be a tensor of the same shape as the two inputs.

knmp_grad
(X: torch.Tensor, Xc: torch.Tensor, Y: torch.Tensor, u: torch.Tensor, opt: Optional[falkon.options.FalkonOptions] = None) → Tuple[torch.Tensor, torch.Tensor]¶ Computes a kernel vector product where the vector is the first derivative of this loss
Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation
\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}'(Y, K(X, X_c) @ u))\] Parameters
X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.
Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.
Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.
u (torch.Tensor) – A vector (or matrix if the labels are multidimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.
opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.
 Returns
grad_mul (torch.Tensor) – A tensor of shape (m x 1) coming from the multiplication of the kernel matrix K(Xc, X) and the loss calculated on predictions with weights u. The formula followed is: (1/n) * K(Xc, X) @ df(Y, K(X, Xc) @ u).
func_val (torch.Tensor) – A tensor of shape (n x t) of predictions obtained with weights u.

knmp_hess
(X: torch.Tensor, Xc: torch.Tensor, Y: torch.Tensor, f: torch.Tensor, u: torch.Tensor, opt: Optional[falkon.options.FalkonOptions] = None) → torch.Tensor¶ Compute a kernelvector product with a rescaling with the second derivative
Given kernel function \(K\), the loss represented by this class \(\mathcal{l}\), number of samples \(n\), this function follows equation
\[\dfrac{1}{n} K(X_c, X) @ (\mathcal{l}''(Y, f) * K(X, X_c) @ u)\] Parameters
X (torch.Tensor) – Data matrix of shape (n x d) with n samples in d dimensions.
Xc (torch.Tensor) – Center matrix of shape (m x d) with m centers in d dimensions.
Y (torch.Tensor) – Label matrix of shape (n x t) with n samples. Depending on the loss, the labels may or may not have more than one dimension.
f (torch.Tensor) – Tensor of shape (n x t) of predictions. Typically this will be the second output of the
knmp_grad()
method.u (torch.Tensor) – A vector (or matrix if the labels are multidimensional) of weights of shape (m x t). The product K(X, Xc) @ u, where K is the kernel associated to this loss, should produce label predictions.
opt (FalkonOptions or None) – Options to be passed to the mmv function for the kernel associated to this loss. Options passed as an argument take precedence over the options used to build this class instance.
 Returns
A tensor of shape (m x t), the output of the computation.
Logistic loss¶

class
falkon.gsc_losses.
LogisticLoss
(kernel: falkon.kernels.kernel.Kernel, opt: Optional[falkon.options.FalkonOptions] = None)¶ Wrapper for the logistic loss, to be used in conjunction with the
LogisticFalkon
estimator.Usage of this loss assumes a binary classification problem with labels 1 and +1. For different choices of labels, see
WeightedCrossEntropyLoss
. Parameters
kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a
LogisticFalkon
modelopt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernelvector products.
Examples
>>> k = falkon.kernels.GaussianKernel(3) >>> log_loss = LogisticLoss(k) >>> estimator = falkon.LogisticFalkon(k, [1e4, 1e4, 1e4], [3, 3, 3], loss=log_loss, M=100)

__call__
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Compute the logistic loss between two 1dimensional tensors
The formula used is \(\log(1 + \exp(y_1 * y_2))\)
 Parameters
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
 Returns
loss – The logistic loss between the two input vectors.

ddf
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Compute the second derivative of the logistic loss with respect to y2
The formula used is
\[y_1^2 \dfrac{1}{1 + \exp(y_1 * y_2)} \dfrac{1}{1 + \exp(y_1 * y_2)}\] Parameters
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
 Returns
dd_loss – The second derivative of the logistic loss, calculated between the two input vectors.

df
(y1: torch.Tensor, y2: torch.Tensor) → torch.Tensor¶ Compute the derivative of the logistic loss with respect to y2
The formula used is
\[\dfrac{y_1}{1 + \exp(y_1 * y_2)}\] Parameters
y1 – The first input tensor. Must be 1D
y2 – The second input tensor. Must be 1D
 Returns
d_loss – The derivative of the logistic loss, calculated between the two input vectors.
Weighted binary cross entropy loss¶

class
falkon.gsc_losses.
WeightedCrossEntropyLoss
(kernel: falkon.kernels.kernel.Kernel, neg_weight: float, opt: Optional[falkon.options.FalkonOptions] = None)¶ Wrapper for the weighted binary crossentropy loss, to be used with the
LogisticFalkon
estimator.Using this loss assumes a binary classification problem with labels 0 and +1. Additionally, this loss allows to place a different weight to samples belonging to one of the two classes (see the neg_weight parameter).
 Parameters
kernel (falkon.kernels.kernel.Kernel) – The kernel function used for training a
LogisticFalkon
modelneg_weight (float) – The weight to be assigned to samples belonging to the negative (0labeled) class. By setting neg_weight to 1, the classes are equally weighted and this loss is equivalent to the
LogisticLoss
loss, but with a different choice of labels.opt (FalkonOptions) – Falkon options container. Will be passed to the kernel when computing kernelvector products.
Examples
>>> k = falkon.kernels.GaussianKernel(3) >>> wce_loss = WeightedCrossEntropyLoss(k) >>> estimator = falkon.LogisticFalkon(k, [1e4, 1e4, 1e4], [3, 3, 3], loss=wce_loss, M=100)

__call__
(true: torch.Tensor, pred: torch.Tensor) → torch.Tensor¶ Compute the weighted BCE loss between two 1dimensional tensors
The formula used is
\[\mathrm{true} * \log(1 + e^{\mathrm{pred}}) + w * (1  \mathrm{true}) * \log(1 + e^{\mathrm{pred}})\] Parameters
true – The label tensor. Must be 1D, with values 0 or 1.
pred – The prediction tensor. Must be 1D. These are “logits” so need not be scaled before hand.
 Returns
loss – The weighted BCE loss between the two input vectors.

ddf
(true: torch.Tensor, pred: torch.Tensor) → torch.Tensor¶ Compute the second derivative of the weighted BCE loss with respect to pred
The formula used is
\[\dfrac{(\mathrm{true} * (w  1)  w) * e^{\mathrm{pred}}}{(e^{\mathrm{pred}} + 1)^2}\] Parameters
true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D
 Returns
dd_loss – The second derivative of the weighted BCE loss between the two input vectors.

df
(true: torch.Tensor, pred: torch.Tensor) → torch.Tensor¶ Compute the derivative of the weighted BCE loss with respect to pred
The formula used is
\[\dfrac{(w * \mathrm{true}  w) * e^{\mathrm{pred}}  \mathrm{true}}{e^{\mathrm{pred}} + 1}\] Parameters
true – The label tensor. Must be 1D
pred – The prediction tensor. Must be 1D
 Returns
d_loss – The derivative of the weighted BCE loss between the two input vectors.