cross_entropy

getml.pipeline.metrics.cross_entropy = 'cross_entropy'

Cross entropy, also referred to as log-loss, is a measure of the likelihood of the classification model.

Used for classification problems.

Mathematically speaking, cross-entropy for a binary classification problem is defined as follows:

\[cross \; entropy = - \frac{1}{N} \sum_{i}^{N} (y_i \log p_i + (1 - y_i) \log(1 - p_i),\]

where \(p_i\) is the probability of a positive outcome as predicted by the classification algorithm and \(y_i\) is the target value, which is 1 for a positive outcome and 0 otherwise.

There are several ways to justify the use of cross entropy to evaluate classification algorithms. But the most intuitive way is to think of it as a measure of likelihood. When we have a classification algorithm that gives us probabilities, we would like to know how likely it is that we observe a particular state of the world given the probabilities.

We can calculate this likelihood as follows:

\[likelihood = \prod_{i}^{N} (p_i^{y_i} * (1 - p_i)^{1 - y_i}).\]

(Recall that \(y_i\) can only be 0 or 1.)

If we take the logarithm of the likelihood as defined above, divide by \(N\) and then multiply by -1 (because we want lower to mean better and 0 to mean perfect), the outcome will be cross entropy.