Information

A basic concept in cybernetics. In cybernetics one studies machines and living organisms only from the point of view of their ability to absorb information given to them, to store information in a "memory" , to transmit it over a communication channel, and to transform it into "signals" . The intuitive picture of information relative to certain quantities or phenomena contained in certain data is developed in cybernetics.

In certain situations it is just as natural to be able to compare various groups of data by the information contained in it as it is to compare plane figures by their "areas" : Independent of the manner of measuring areas one can prove that a figure does not have a larger area than if can be completely included in (cf. Examples 1–3 below). The deeper fact that it is possible to express area by a number and thereby comparing figures of arbitrary shape is a result of an extensive mathematical theory. The analogue of this fundamental result in information theory is the statement that under definite, very wide, assumptions one may disregard the qualitative peculiarities of information and express its amount by a number. This number only describes the possibility of transmitting information over a communication channel and of storing it in machines with a memory.

Example 1. Specifying the position and velocity of a particle moving in a force field provides information on its position at any future moment of time; this information is, moreover, complete: its position can be exactly predicted. Specifying the energy of a particle also provides information, but this information is incomplete, obviously.

Example 2. The equality

 (1)

provides information about the relation between the variables and . The equality

 (2)

provides less information (since (1) implies (2), but they are not equivalent). Finally, the equality (for real numbers)

 (3)

is equivalent to (1) and provides the same information, i.e. (1) and (3) are different forms of specifying the same information.

Example 3. Results of measurements of some physical quantity, performed within certain errors, provide information on its exact value. By increasing the number of observations one changes this information.

Example . The arithmetical average of results of observations also contains certain information about the quantity being measured. As is shown in mathematical statistics, if the errors have a normal probability distribution with known variance, then the arithmetical average contains all information.

Example 4. Suppose that the result of a measurement is a random variable . By transmitting over a communication channel, is distorted, so that at the receiving end of the channel one obtains the variable

where is independent of (in the sense of probability theory). The "output" provides information on the "input" , and it is natural to assume that this information is smaller because has "scattered" values.

In each of the examples given, data are compared with respect to providing information which is more complete or less. In Examples 1–3 the meaning of this comparison is clear and leads to the analysis of the equivalence or non-equivalence of certain relations. In Examples 3a and 4 this meaning needs to be made more precise. This is provided in mathematical statistics and information theory (for which these examples are typical).

At the basis of information theory is a definition suggested in 1948 by C.E. Shannon, of measuring the amount of information contained in one random object (event, variable, function, etc.) with respect to another. It consists in expressing the amount of information by a number. It can be extremely well explained in the simplest case when the random objects considered are random variables taking only a finite number of values. Let be a random variable taking values with probabilities and let be a random variable taking values with probabilities . Then the information contained in with respect to is defined by the formula

where is the probability of joint occurrence of and , and the logarithm is to base 2. The information has a number of properties that are naturally required for a measure of quantity of information. Thus, always , and equality holds if only if for all and , i.e. if and only if and are independent random variables. Further, and equality holds only if is a function of (e.g. , etc.). More surprising is the fact that .

The quantity is called the entropy of . The concept of the entropy is basic in information theory. The amount of information and the entropy are related by

 (5)

where is the entropy of the pair , i.e.

The entropy turns out to be the average number of binary symbols necessary for differentiation (or description) of the possible values of a random variable. This makes it possible to understand the role of the amount of information

in "storing" information in machines with a memory. If and are independent random variables, then one needs on the average binary symbols to write down the values of , binary symbols for those of , and binary symbols for those of the pair . If and are dependent, then the average number of binary symbols necessary for writing down the pair is less than , since .

Using deeper theorems, the role of the amount of information

in problems of information transmission over communication channels can be explained. The basic information-theoretic characteristic of channels, their so-called capacity (cf. Transmission rate of a channel), is defined in terms of the concept of "information" .

If and may take an infinite set of values, then by limit transition one obtains from :

 (6)

where and denote the corresponding probability densities. The entropies and do not exist in this case, but there is the formula, analogous to (5),

 (7)

where

is the differential entropy of ( and are defined likewise).

Example 5. Suppose that under the conditions of Example 4 the random variables and have normal probability distributions with mean zero and with variances equal to, respectively, and . Then, as may be inferred from (6) or (7): . Thus, the amount of information in the "received signal" with respect to the "transmitted signal" tends to zero as the level of "noise" grows (i.e. as ), and grows without bound when the "noise" vanishes (i.e. as ).

The case when the random variables and in Example 4 or 5 are stochastic functions (or, as one says, stochastic processes) and , describing the variation of a quantity at the input, respectively output, of the channel, is of special interest. The amount of information in with respect to for a given level of noise (in acoustic terminology) may serve as a criterion of the quality of the channel itself.

In problems in mathematical statistics one also uses the concept of information (cf. Examples 3 and 3a). However, both by its formal definition as by the name it has been given, it differs from the concept defined above (in information theory). Statistics deals with a large number of results of observations and usually replaces the complete listing of them by certain combined characteristics. In this replacement information is sometimes lost, but under certain conditions the combined characteristics contain all the information contained in the complete data (this statement is explained at the end of Example 6 below). The concept of information was introduced into statistics by R.A. Fisher in 1921.

Example 6. Let be the results of independent observations of some quantity, normally distributed with probability density

where the parameters and (the mean and variance) are unknown and must be estimated using the results of observations. Sufficient statistics (i.e. functions in the results of observations containing complete information on the unknown parameters) for this case are provided by the arithmetical average

and the so-called empirical variance

If is known, then by itself is a sufficient statistic (cf. Example 3a).

The meaning of the term "complete information" can be clarified in the following way. Suppose one has a function of the unknown parameter , let be an estimator for it that is free of systematic errors. Suppose that the quality of the estimator (its exactness) is measured (as is usual in problems in mathematical statistics) by the variance of the difference . Then there exists another estimator , not depending on the remaining but only on and , that is not worse (in the sense of the criterion mentioned above) than . Fisher has also proposed a measure of the (average) amount of information with respect to an unknown parameter contained in one observation. This concept is revealed in the theory of statistical estimation.

For references, see Information, transmission of.