# Entropy

An information-theoretical measure of the degree of indeterminacy of a random variable. If is a discrete random variable defined on a probability space and assuming values with probability distribution , , then the entropy is defined by the formula (1)

(here it is assumed that ). The base of the logarithm can be any positive number, but as a rule one takes logarithms to the base 2 or , which corresponds to the choice of a bit or a nat (natural unit) as the unit of measurement.

If and are two discrete random variables taking values and with probability distributions and , and if is the conditional distribution of assuming that , then the (mean) conditional entropy of given is defined as (2)

Let be a stationary process with discrete time and discrete space of values such that . Then the entropy (more accurately, the mean entropy) of this stationary process is defined as the limit (3)

where is the entropy of the random variable . It is known that the limit on the right-hand side of (3) always exists and that (4)

where is the conditional entropy of given . The entropy of stationary processes has important applications in the theory of dynamical systems.

If and are two measures on a measurable space and if is absolutely continuous relative to and is the corresponding Radon–Nikodým derivative, then the entropy of relative to is defined as the integral (5)

A special case of the entropy of one measure with respect to another is the differential entropy.

Of the many possible generalizations of the concept of entropy in information theory one of the most important is the following. Let and be two random variables taking values in certain measurable spaces and . Suppose that the distribution of is given and let be a class of admissible joint distributions of the pair in the set of all probability measures in the product . Then the -entropy (or the entropy for a given condition of exactness of reproduction of information (cf. Information, exactness of reproducibility of)) is defined as the quantity (6)

where is the amount of information (cf. Information, amount of) in given and the infimum is taken over all pairs of random variables such that the joint distribution of the pair belongs to and has the distribution . The class of joint distributions is often given by means of a certain non-negative measurable real-valued function , , , a measure of distortion, in the following manner: (7)

where is fixed. In this case the quantity defined by (6), where is given by (7), is called the -entropy (or the rate as a function of the distortion) and is denoted by . For example, if is a Gaussian random vector with independent components, if , , and if the function , , , has the form then can be found by the formula where is defined by If is a discrete random variable, if and are the same, and if has the form then the -entropy for is equal to the ordinary entropy defined in (1), that is, .