Claude E. Shannon defined a measure of entropy (H =  Σ p_{i} log p_{i}) that, when applied to an information source, could determine the capacity of the channel required to transmit the source as encoded binary digits. Shannon's measure of entropy came to be taken as a measure of the information contained in a message, as opposed to the portion of the message that is strictly determined (hence predictable) by inherent structures, like for instance redundancy in the structure of languages or the statistical properties of a language relating to the frequencies of occurrence of different letter or word pairs, triplets etc. See Markov chains.
Entropy as defined by Shannon is closely related to entropy as defined by physicists. Boltzmann and Gibbs did considerable work on statistical thermodynamics. This work was the inspiration for adopting the term entropy in information theory. There are deep relationships between entropy in the thermodynamic and informational senses. For instance, Maxwell's demon needs information to reverse thermodynamic entropy and getting that information exactly balances out the thermodynamic gain that the demon would otherwise achieve.
In information theory, entropy is conceptually the actual amount of (information theoretic) infomation in a piece of data. Entirely random byte data has an entropy of about 8, since you never know what the next character will be. A long string of A's has an entropy of 0, since you know that the next character will always be an 'A'. The entropy of English text is about 1.5 bits per character (Try compressing it with the PPM compression algorithm[?]!) The entropy rate of a data source means the average number of bits per symbol needed to encode it.
Entropy is effectively the lowest compression possible, which can be realised in theory by the use of the typical set or in practise using Huffman, LempelZiv or Arithmetic coding. The definition of entropy is based on the Markov model of text. For an order0 source (each character is selected independent of the last characters), the entropy is:
Where <math>p_i</math> is the probability of <math>i</math>. For a higherorder Markov source (one in which probabilities are dependent on the preceding characters), the entropy is:
Where <math>i</math> is a state (certain preceding characters) and <math>p_i(j)</math> is the probability of <math>j</math> given <math>i</math> as the previous character (s).
Search Encyclopedia

Featured Article
