Namespaces
Variants
Actions

Kullback-Leibler-type distance measures

From Encyclopedia of Mathematics
Jump to: navigation, search


In mathematical statistics one usually considers, among others, estimation, testing of hypothesis, discrimination, etc. When considering the statistical problem of discrimination, S. Kullback and R.A. Leibler [a13] introduced a measure of the "distance" or "divergence" between statistical populations, known variously as information for discrimination, $ I $- divergence, the error, or the directed divergence. While the Shannon entropy is fundamental in information theory, several generalizations of Shannon's entropy have also been proposed. In statistical estimation problems, measures between probability distributions play a significant role. The Chernoff coefficient, Hellinger–Bhattacharyya coefficient, Jeffreys distance, the directed divergence and its symmetrization, $ J $- divergence, $ f $- divergence, etc. are examples of such measures. These measures have many applications in statistics, pattern recognition, numerical taxonomy, etc.

Let

$$ \Gamma _ {n} = \left \{ P = ( p _ {1} \dots p _ {n} \mid p _ {i} > 0 ) \textrm{ and } \sum _ {i = 1 } ^ { n } p _ {i} = 1 \right \} $$

be the set of all complete discrete probability distributions of length $ n \geq 2 $( cf. Density of a probability distribution). Let $ I = ( 0,1 ) $ and let $ \mathbf R $ be the set of real numbers. For $ P, Q $ in $ \Gamma _ {n} $, Kullback and Leibler [a13] defined the directed divergence as

$$ \tag{a1 } D _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ { n } p _ {i} { \mathop{\rm log} } { \frac{p _ {i} }{q _ {i} } } = \sum p _ {i} ( { \mathop{\rm log} } p _ {i} - { \mathop{\rm log} } q _ {i} ) . $$

Usually, measures are characterized by using the many algebraic properties possessed by them, for example, see [a8] for (a1). A sequence of measures $ {\mu _ {n} } : {\Gamma _ {n} \times \Gamma _ {n} } \rightarrow \mathbf R $ is said to have the sum property if there exists a function $ f : {I ^ {2} } \rightarrow \mathbf R $ such that $ \mu _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ {n} f ( p _ {i} , q _ {i} ) $ for $ P, Q \in \Gamma _ {n} $. In this case $ f $ is said to be a generating function of $ \{ \mu _ {n} \} $. A stronger version of the sum property is $ f $- divergence [a6]. The measure $ \mu _ {n} $ is an $ f $- divergence if and only if it has a representation

$$ \mu _ {n} ( P \| Q ) = \sum p _ {i} f \left ( { \frac{p _ {i} }{q _ {i} } } \right ) $$

for some $ f : {( 0, \infty ) } \rightarrow \mathbf R $. The measures $ \mu _ {n} $ are said to be $ ( m,n ) $- additive if $ \mu _ {mn } ( P \star R \| Q \star S ) = \mu _ {m} ( R \| S ) + \mu _ {n} ( P \| Q ) $ where $ P \star R = ( p _ {1} r _ {1} \dots p _ {1} r _ {m} , p _ {2} r _ {1} \dots p _ {2} r _ {m} \dots p _ {n} r _ {m} ) $.

Measures $ \mu _ {n} $ having the sum property with a Lebesgue-measurable generating function $ f $ are $ ( 2, 2 ) $- additive if and only if they are given by

$$ \mu _ {n} ( P \| Q ) = $$

$$ = 4 aH _ {n} ^ {3} ( P ) + 4a ^ \prime H _ {n} ^ {3} ( Q ) - 9a H _ {n} ^ {2} ( P ) - 9a ^ \prime H _ {n} ^ {2} ( Q ) + $$

$$ + b H _ {n} ( P ) + b ^ \prime H _ {n} ( Q ) + $$

$$ + c I _ {n} ( P \| Q ) + c ^ \prime I _ {n} ( Q \| P ) + dn, $$

where $ a $, $ a ^ \prime $, $ b $, $ b ^ \prime $, $ c $, $ c ^ \prime $, $ d $ are constants, $ H _ {n} ( P ) = - \sum p _ {i} { \mathop{\rm log} } p _ {i} $( Shannon entropy), $ H _ {n} ^ \beta ( P ) = ( 2 ^ {1 - \beta } - 1 ) ^ {- 1 } ( \sum p _ {i} ^ \beta - 1 ) $( entropy of degree $ \beta \neq 1 $) and $ I _ {n} ( P \| Q ) = - \sum p _ {i} { \mathop{\rm log} } q _ {i} $( inaccuracy). However, (a1) is neither symmetric nor satisfies the triangle inequality and thus its use as a metric is limited. In [a7], the symmetric divergence or $ J $- divergence $ J _ {n} ( P \| Q ) = D _ {n} ( P \| Q ) + D _ {n} ( Q \| P ) $ was introduced to restore symmetry.

A sequence of measures $ \{ \mu _ {m} \} $ is said to be symmetrically additive if

$$ \mu _ {nm } ( P \star R \| Q \star S ) + \mu _ {nm } ( P \star S \| Q \star R ) = $$

$$ = 2 \mu _ {n} ( P \| Q ) + 2 \mu _ {m} ( R \| S ) $$

for all $ P, Q \in \Gamma _ {n} $, $ R, S \in \Gamma _ {m} $.

Sum-form measures $ \{ \mu _ {n} \} $ with a measurable symmetric generating function $ f : {I ^ {2} } \rightarrow \mathbf R $ are symmetrically additive for all pairs of integers $ m, n \geq 2 $ and have the form [a5]

$$ \mu _ {n} ( P \| Q ) = $$

$$ = \sum _ {i = 1 } ^ { n } [ p _ {i} ( a { \mathop{\rm log} } p _ {i} + b { \mathop{\rm log} } q _ {i} ) + q _ {i} ( a { \mathop{\rm log} } q _ {i} + b { \mathop{\rm log} } p _ {i} ) ] . $$

It is well known that $ H _ {n} ( P ) \leq I _ {n} ( P \| Q ) $, that is,

$$ - \sum p _ {i} { \mathop{\rm log} } p _ {i} \leq - \sum p _ {i} { \mathop{\rm log} } q _ {i} , $$

which is known as the Shannon inequality. This inequality gives rise to the error $ D _ {n} ( P \| Q ) \geq 0 $ in (a1). A function $ {\mu _ {n} } : {\Gamma _ {n} ^ {2} } \rightarrow \mathbf R $ is called a separability measure if and only if $ \mu _ {n} ( P \| Q ) \geq 0 $ and $ \mu _ {n} ( P \| Q ) $ attains a minimum if $ P = Q $ for all $ P,Q \in \Gamma _ {n} $ with $ n \geq 2 $. A separability measure $ \mu _ {n} $ is a distance measure of Kullback–Leibler type if there exists an $ f : I \rightarrow \mathbf R $ such that $ \mu _ {n} ( P \| Q ) = \sum p _ {i} ( f ( p _ {i} ) - f ( q _ {i} ) ) $. Any Kullback–Leibler-type distance measure with generating function $ f $ satisfies the inequality $ \sum p _ {k} f ( q _ {k} ) \leq \sum p _ {k} f ( p _ {k} ) $( see [a10], [a2]).

References

[a1] J. Aczél, Z. Daróczy, "On measures of information and their characterizations" , Acad. Press (1975) Zbl 0345.94022
[a2] J. Aczél, A.M. Ostrowski, "On the characterization of Shannon's entropy by Shannon's inequality" J. Austral. Math. Soc. , 16 (1973) pp. 368–374
[a3] A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by their probability distributions" Bull. Calcutta Math. Soc. , 35 (1943) pp. 99–109
[a4] A. Bhattacharyya, "On a measure of divergence between two multinomial populations" Sankhya , 7 (1946) pp. 401–406
[a5] J.K. Chung, P.L. Kannappan, C.T. Ng, P.K. Sahoo, "Measures of distance between probability distributions" J. Math. Anal. Appl. , 139 (1989) pp. 280–292 DOI 10.1016/0022-247X(89)90335-1 Zbl 0669.60025
[a6] I. Csiszár, "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten" Magyar Tud. Kutato Int. Közl. , 8 (1963) pp. 85–108
[a7] H. Jeffreys, "An invariant form for the prior probability in estimation problems" Proc. Roy. Soc. London A , 186 (1946) pp. 453–461 DOI 10.1098/rspa.1946.0056 Zbl 0063.03050
[a8] Pl. Kannappan, P.N. Rathie, "On various characterizations of directed divergence" , Proc. Sixth Prague Conf. on Information Theory, Statistical Decision Functions and Random Process (1971)
[a9] Pl. Kannappan, C.T. Ng, "Representation of measures information" , Trans. Eighth Prague Conf. , C , Prague (1979) pp. 203–206
[a10] Pl. Kannappan, P.K. Sahoo, "Kullback–Leibler type distance measures between probability distributions" J. Math. Phys. Sci. , 26 (1993) pp. 443–454
[a11] Pl. Kannappan, P.K. Sahoo, J.K. Chung, "On a functional equation associated with the symmetric divergence measures" Utilita Math. , 44 (1993) pp. 75–83
[a12] S. Kullback, "Information theory and statistics" , Peter Smith, reprint , Gloucester MA (1978)
[a13] S. Kullback, R.A. Leibler, "On information and sufficiency" Ann. Math. Stat. , 22 (1951) pp. 79–86 DOI 10.1214/aoms/1177729694 Zbl 0042.38403
[a14] C.E. Shannon, "A mathematical theory of communication" Bell System J. , 27 (1948) pp. 379–423; 623–656
How to Cite This Entry:
Kullback–Leibler-type distance measures. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kullback%E2%80%93Leibler-type_distance_measures&oldid=22684