cross_entropy¶

getml.pipeline.scores.cross_entropy = 'cross_entropy'¶

Cross entropy, also referred to as log-loss, is a measure of the likelihood of the classification model.

Used for classification problems.

Mathematically speaking, cross-entropy for a binary classification problem is defined as follows:

$cross \; entropy = - \frac{1}{N} \sum_{i}^{N} (y_i \log p_i + (1 - y_i) \log(1 - p_i),$

where $p_i$ is the probability of a positive outcome as predicted by the classification algorithm and $y_i$ is the target value, which is 1 for a positive outcome and 0 otherwise.

There are several ways to justify the use of cross entropy to evaluate classification algorithms. But the most intuitive way is to think of it as a measure of likelihood. When we have a classification algorithm that gives us probabilities, we would like to know how likely it is that we observe a particular state of the world given the probabilities.

We can calculate this likelihood as follows:

$likelihood = \prod_{i}^{N} (p_i^{y_i} * (1 - p_i)^{1 - y_i}).$

(Recall that $y_i$ can only be 0 or 1.)

If we take the logarithm of the likelihood as defined above, divide by $N$ and then multiply by -1 (because we want lower to mean better and 0 to mean perfect), the outcome will be cross entropy.