Glivenko-Cantelli Theorem
Get Glivenko%E2%80%93Cantelli Theorem essential facts below. View Videos or join the Glivenko%E2%80%93Cantelli Theorem discussion. Add Glivenko%E2%80%93Cantelli Theorem to your Like2do.com topic list for future reference or share this resource on social media.
Glivenko%E2%80%93Cantelli Theorem

In the theory of probability, the Glivenko-Cantelli theorem, named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows.[1] The uniform convergence of more general empirical measures becomes an important property of the Glivenko-Cantelli classes of functions or sets.[2] The Glivenko-Cantelli classes arise in Vapnik-Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.

Assume that ${\displaystyle X_{1},X_{2},\dots }$ are independent and identically-distributed random variables in ${\displaystyle \mathbb {R} }$ with common cumulative distribution function ${\displaystyle F(x)}$. The empirical distribution function for ${\displaystyle X_{1},\dots ,X_{n}}$ is defined by

${\displaystyle F_{n}(x)={\frac {1}{n}}\sum _{i=1}^{n}I_{[X_{i},\infty )}(x)}$

where ${\displaystyle I_{C}}$ is the indicator function of the set ${\displaystyle C}$. For every (fixed) ${\displaystyle x}$, ${\displaystyle F_{n}(x)}$ is a sequence of random variables which converge to ${\displaystyle F(x)}$ almost surely by the strong law of large numbers, that is, ${\displaystyle F_{n}}$ converges to ${\displaystyle F}$ pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of ${\displaystyle F_{n}}$ to ${\displaystyle F}$.

Theorem

${\displaystyle \|F_{n}-F\|_{\infty }=\sup _{x\in \mathbb {R} }|F_{n}(x)-F(x)|\longrightarrow 0}$ almost surely.[3]

This theorem originates with Valery Glivenko,[4] and Francesco Cantelli,[5] in 1933.

Remarks

• If ${\displaystyle X_{n}}$ is a stationary ergodic process, then ${\displaystyle F_{n}(x)}$ converges almost surely to ${\displaystyle F(x)=E(1_{X_{1}\leq x})}$. The Glivenko-Cantelli theorem gives a stronger mode of convergence than this in the iid case.
• An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of law of the iterated logarithm.[6] See asymptotic properties of the empirical distribution function for this and related results.

## Proof

For simplicity, consider a case of continuous random variable ${\displaystyle X}$. Fix ${\displaystyle -\infty =x_{0} such that ${\displaystyle F(x_{j})-F(x_{j-1})={\frac {1}{m}}}$ for ${\displaystyle j=1,\dots ,m}$. Now for all ${\displaystyle x\in \mathbb {R} }$ there exists ${\displaystyle j\in \{1,\dots ,m\}}$ such that ${\displaystyle x\in [x_{j-1},x_{j}]}$. Note that

{\displaystyle {\begin{aligned}F_{n}(x)-F(x)&\leq F_{n}(x_{j})-F(x_{j-1})=F_{n}(x_{j})-F(x_{j})+1/m,\\F_{n}(x)-F(x)&\geq F_{n}(x_{j-1})-F(x_{j})=F_{n}(x_{j-1})-F(x_{j-1})-1/m.\end{aligned}}}

Therefore, almost surely

${\displaystyle ||F_{n}-F||_{\infty }=\sup _{x\in \mathbb {R} }|F_{n}(x)-F(x)|\leq \max _{j\in \{1,\dots ,m\}}|F_{n}(x_{j})-F(x_{j})|+1/m.}$

Since ${\textstyle \max _{j\in \{1,\dots ,m\}}|F_{n}(x_{j})-F(x_{j})|\to 0{\text{ a.s.}}}$ by strong law of large numbers, we can guarantee that for any integer ${\textstyle m}$ we can find ${\textstyle N}$ such that for all ${\displaystyle n\geq N}$

${\displaystyle ||F_{n}-F||_{\infty }\leq 1/m{\text{ a.s.}}}$,

which is the definition of almost sure convergence.

## Empirical measures

One can generalize the empirical distribution function by replacing the set ${\displaystyle (-\infty ,x]}$ by an arbitrary set C from a class of sets ${\displaystyle {\mathcal {C}}}$ to obtain an empirical measure indexed by sets ${\displaystyle C\in {\mathcal {C}}.}$

${\displaystyle P_{n}(C)={\frac {1}{n}}\sum _{i=1}^{n}I_{C}(X_{i}),C\in {\mathcal {C}}}$

Where ${\displaystyle I_{C}(x)}$ is the indicator function of each set ${\displaystyle C}$.

Further generalization is the map induced by ${\displaystyle P_{n}}$ on measurable real-valued functions f, which is given by

${\displaystyle f\mapsto P_{n}f=\int _{S}f\,dP_{n}={\frac {1}{n}}\sum _{i=1}^{n}f(X_{i}),f\in {\mathcal {F}}.}$

Then it becomes an important property of these classes that the strong law of large numbers holds uniformly on ${\displaystyle {\mathcal {F}}}$ or ${\displaystyle {\mathcal {C}}}$.

## Glivenko-Cantelli class

Consider a set ${\displaystyle {\mathcal {S}}}$ with a sigma algebra of Borel subsets A and a probability measure P. For a class of subsets,

${\displaystyle {\mathcal {C}}\subset \{C:C{\mbox{ is measurable subset of }}{\mathcal {S}}\}}$

and a class of functions

${\displaystyle {\mathcal {F}}\subset \{f:{\mathcal {S}}\to \mathbb {R} ,f{\mbox{ is measurable}}\,\}}$

define random variables

${\displaystyle \|P_{n}-P\|_{\mathcal {C}}=\sup _{C\in {\mathcal {C}}}|P_{n}(C)-P(C)|}$
${\displaystyle \|P_{n}-P\|_{\mathcal {F}}=\sup _{f\in {\mathcal {F}}}|P_{n}f-Pf|}$

where ${\displaystyle P_{n}(C)}$ is the empirical measure, ${\displaystyle P_{n}f}$ is the corresponding map, and

${\displaystyle \mathbb {E} f=\int _{\mathcal {S}}f\,dP=Pf}$, assuming that it exists.

Definitions

• A class ${\displaystyle {\mathcal {C}}}$ is called a Glivenko-Cantelli class (or GC class) with respect to a probability measure P if any of the following equivalent statements is true.
1. ${\displaystyle \|P_{n}-P\|_{\mathcal {C}}\to 0}$ almost surely as ${\displaystyle n\to \infty }$.
2. ${\displaystyle \|P_{n}-P\|_{\mathcal {C}}\to 0}$ in probability as ${\displaystyle n\to \infty }$.
3. ${\displaystyle \mathbb {E} \|P_{n}-P\|_{\mathcal {C}}\to 0}$, as ${\displaystyle n\to \infty }$ (convergence in mean).
The Glivenko-Cantelli classes of functions are defined similarly.
• A class is called a universal Glivenko-Cantelli class if it is a GC class with respect to any probability measure P on (S,A).
• A class is called uniformly Glivenko-Cantelli if the convergence occurs uniformly over all probability measures P on (S,A):
${\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\mathbb {E} \|P_{n}-P\|_{\mathcal {C}}\to 0;}$
${\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\mathbb {E} \|P_{n}-P\|_{\mathcal {F}}\to 0.}$

Theorem (Vapnik and Chervonenkis, 1968)[7]

A class of sets ${\displaystyle {\mathcal {C}}}$ is uniformly GC if and only if it is a Vapnik-Chervonenkis class.

## Examples

• Let ${\displaystyle S=\mathbb {R} }$ and ${\displaystyle {\mathcal {C}}=\{(-\infty ,t]:t\in {\mathbb {R} }\}}$. The classical Glivenko-Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem,
${\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\|P_{n}-P\|_{\mathcal {C}}\sim n^{-1/2}}$, that is ${\displaystyle {\mathcal {C}}}$ is uniformly Glivenko-Cantelli class.
• Let P be a nonatomic probability measure on S and ${\displaystyle {\mathcal {C}}}$ be a class of all finite subsets in S. Because ${\displaystyle A_{n}=\{X_{1},\ldots ,X_{n}\}\in {\mathcal {C}}}$, ${\displaystyle P(A_{n})=0}$, ${\displaystyle P_{n}(A_{n})=1}$, we have that ${\displaystyle \|P_{n}-P\|_{\mathcal {C}}=1}$ and so ${\displaystyle {\mathcal {C}}}$ is not a GC class with respect to P.

## References

1. ^ Howard G.Tucker (1959). "A Generalization of the Glivenko-Cantelli Theorem". The Annals of Mathematical Statistics. 30: 828-830. doi:10.1214/aoms/1177706212.
2. ^ van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. p. 279. ISBN 0-521-78450-6.
3. ^ van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. p. 265. ISBN 0-521-78450-6.
4. ^ Glivenko, V. (1933). Sulla determinazione empirica delle leggi di probabilità. Giorn. Ist. Ital. Attuari 4, 92-99.
5. ^ Cantelli, F. P. (1933). Sulla determinazione empirica delle leggi di probabilità. Giorn. Ist. Ital. Attuari 4, 421-424.
6. ^ van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. p. 268. ISBN 0-521-78450-6.
7. ^ Vapnik, V. N.; Chervonenkis, A. Ya (1971). "On uniform convergence of the frequencies of events to their probabilities". Theor. Prob. Appl. 16 (2): 264-280. doi:10.1137/1116025.