- tags
- Complexity metrics
- references
- (Shannon and Weaver 1975)

For a discrete random variable \(X\) with outcomes \(x_i\), \(P(X=x_i) = P_i\), the **entropy** or **uncertainty function** of \(X\) is defined as \[ H(X) = -\sum_{i=1}^{N} P_i \log P_i \]

Entropy is always positive, and is maximized when the uncertainty is maximal, that is when \(P_1 = P_2 = … = P_N = \frac{1}{N}\) entropy in that case is \(\log N\).

Interpretations:

- \(H\) measures uncertainty of a process (or the average number of yes/no answers one needs to specify the value of \(i\)).
- Average information received by an observer when reading the outcomes of \(X\).

If we encode the random variable \(X\) with an erroneous encoding \(P_i'\), the resulting code length is \(\sum P_i \log \frac{1}{P_i'}\). The difference between this encoding and the optimal encoding is the Kullback-leibler divergence.

## Shannon entropy for a sequence

In the case of a sequence of symbols $s_1, …, s_i, …$, following a translation invariant distribution (for any \(n\) and \(k\), \(P(s_1s_2…s_n) = P(s_{1+k}s_{2+k}…s_{n+k})\)).

We can define the block entropy of the $n$-tuple random variable \(S=(S_1…S_n)\) like above \[ H_n = \sum_{s_1…s_n} P(s_1…s_n) \log P(s_1…s_n) \]

Then, we can define the average length of the description per symbol of the sequence as \[ h =\lim_{n \rightarrow \infty} h_n \] \[ h_n =H_{n+1} - H_n \]

\(h\) is called by Shannon the entropy of the source emitting the sequence, or entropy of the sequence.