|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
This article is about the theory of source coding in data compression. For the term in computer programming, see Source code.
In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy. The source coding theorem shows that (in the limit, as the length of a stream of independent and identically-distributed random variable (i.i.d.) data tends to infinity) it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss. The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.
StatementsSource coding is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is the concept behind data compression. Source coding theoremIn information theory, the source coding theorem (Shannon 1948) informally states that:
Source coding theorem for symbol codesLet Σ1, Σ2 denote two finite alphabets and let Suppose that X is a random variable taking values in Σ1 and let f be a decipherable code from If f is optimal in the sense that it has the minimal expected wordlength for X, then (Shannon 1948) Proof: Source coding theoremGiven X is an i.i.d. source, its time series X1, ..., Xn is i.i.d. with entropy H(X) in the discrete-valued case and differential entropy in the continuous-valued case. The Source coding theorem states that for any ε > 0 for any rate larger than the entropy of the source, there is large enough n and an encoder that takes n i.i.d. repetition of the source, X1:n, and maps it to n.(H(X) + ε) binary bits such that the source symbols X1:n are recoverable from the binary bits with probability at least 1 − ε. Proof for achievability Fix some for any ε > 0. The typical set,
AEP shows that for large enough n, the probability that a sequence generated by the source lies in the typical set, The definition of typical sets implies that those sequences that lie in the typical set satisfy: Note that:
Since The encoding algorithm: The encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary n.(H(X) + ε) digit number. As long as the input sequence lies within the typical set (with probability at least 1 − ε), the encoder doesn't make any error. So, the probability of error of the encoder is bounded above by ε Proof for converse The converse is proved by showing that any set of size smaller than Proof: Source coding theorem for symbol codesLet si denote the wordlength of each possible xi ( Then where the second line follows from Gibbs' inequality and the fifth line follows from Kraft's inequality: For the second inequality we may set so that and so and and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal S satisfies Extension to non-stationary independent sourcesFixed Rate lossless source coding for discrete time non-stationary independent sourcesDefine typical set Then, for given δ > 0, for n large enough, See also
References
|
| All Right Reserved © 2007, Designed by Stylish Blog. |