Information distance

From Encyclopedia of Mathematics
Jump to: navigation, search

A metric or pseudo-metric on the set of probability distributions, characterizing the "non-similarity" of the random phenomena described by these distributions. Most interesting is the information distance related to the measure of informativity of an experiment in the problem of differentiating between and by observations.

In any concrete statistical problem it is necessary to make inferences on the observed phenomenon. These inferences are, as a rule, not exact, since the outcomes of observations are random. It is intuitively clear that any sample carries some amount of useful information. Moreover: A) information may only get lost in transmission; and B) information presented by different independent sources, e.g. independent samples, can be summed. Thus, if one introduces the informativity of an experiment as the average amount of information (cf. also Information, amount of) in an observation, then for it the axioms A) and B) are fulfilled. Although the concept of information remains intuitive, one can sometimes find a quantity satisfying A) and B) that describes asymptotically the average exactness of inferences in a problem with a growing number of observations, and that therefore can naturally be taken as the informativity. The informativity is either a numerical or a matrix quantity. An important example is the information matrix in the problem of estimating the parameter of a distribution law.

According to axiom B) informativities behave like squares of length, i.e. the square of a reasonable information distance must have the property of additivity. The simplest information distances are: the distance in variation:

and the Fisher distance in an invariant Riemannian metric:

The latter does not have the property of additivity, and has no proper statistical meaning.

According to the Neyman–Pearson theory all the useful information about differentiating between probability distributions and on a common space of outcomes is contained in the likelihood ratio or its logarithm:

determined up to values on a set of outcomes of probability zero. The mathematical expectation

is called the (average) information for differentiating (according to Kullback) in favour of against , and also the relative entropy, or information deviation. The non-negative (perhaps infinite) quantity satisfies axioms A) and B). It characterizes the exactness of one-sided differentiation of against , having defined the maximal order of decrease of the probability of an error of the second kind (i.e. falsely accepting hypothesis when it is not true). As the number of independent observations grows one has:

for a fixed significance level — the probability of an error of the first kind, .

The analogous quantity determines the maximal order of decrease of for . The relation of "similarity" , in particular that of "similarity" of random phenomena, is not symmetric and, as a rule, . The geometric interpretation of as half the square of the non-symmetric distance from to proved to be natural in a number of problems in statistics. For such information distances the triangle inequality is not true, but a non-symmetric analogue of the Pythagorean theorem holds:


A symmetric characteristic of similarity of and arises when testing them by a minimax procedure. For an optimal test

Certain other information distances are related to the information deviation (cf. [1], [2]). For infinitesimally close and the principal part of the information deviation, as well of the square of any reasonable information distance, is given, up to a constant multiple , by the Fisher quadratic form. For the information deviation

On the other hand, any information deviation satisfying axiom A) only induces a topology which majorizes the topology induced by the distance in variation (defined above), [3], [4].


[1] S. Kullback, "Information theory and statistics" , Wiley (1959)
[2] N.N. [N.N. Chentsov] Čentsov, "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian)
[3] I. Csiszar, "On topological properties of -divergences" Studia Sci. Math. Hungar. , 2 (1967) pp. 329–339
[4] E.A. Morozova, N.N. [N.N. Chentsov] Čencov, "Markov maps in noncommutative probability theory and mathematical statistics" Yu.V. Prokhorov (ed.) et al. (ed.) , Probability theory and mathematical statistics , VNU (1987) pp. 287–310
How to Cite This Entry:
Information distance. N.N. Chentsov (originator), Encyclopedia of Mathematics. URL:
This text originally appeared in Encyclopedia of Mathematics - ISBN 1402006098