Welford Algorithm

The variance is the expectation of the squared deviation of a random variable form its mean.  Informally, it measures how far a set of (random) numbers are spread out from their average value.
The variance is the square of the standard deviation, the second central moment of a distribution and the covariance of the random variable with itself, and it is often represented by Var(X), σ2, s2.

One of the most widely known formula for computing the variance is:



where x-bar is the mean of the sample.


The definition given above can be converted into an algorithm that computed the variance and the standard deviation in two passes:

1. Compute the mean                            (O(n))
2. Compute the square differences       (O(n))
Output the variance


Even though this algorithm seems working properly, it may become too expensive on some input instances. Just consider a sampling procedure where a huge amount of values (e.g. several GBs) must be stored in memory in order to complete the first computation step.

One can think about the following single-pass algorithm for computing the variance: by simply accumulating the sums of xi and xi2:



the pseudocode of the above one-pass algorithm for computing the variance can be written as:


variance(sample):
     sum = 0
     sumsq = 0
     for x in sample:
           sum = sum + x
           sumsq = sumsq + x**2
     mean = sum/N
     return(sumsq - N*mean**2)/(N-1)


However, this algorithm is not correct and therefore it should not be used.
Note that if the variance is small (compared to the square of the mean), computing the difference leads to a catastrophic cancelation. In fact, this algorithm may produce negative variances, which is mathematically impossible and doesn't even make sense.


The Welford algorithm is a usable single-pass algorithm for computing the variance. It can be derived by looking at the differences between the sums of squared differences for N and (N-1) samples.



Which means we can compute the variance in a single-pass using the following algorithm:
* a properly-explained derivation can be found here.
variance(sample):
     M = 0
     S = 0
     for k from 1 to N:
          x = sample[k]
          oldM = M
          M += (x - M)/k
          S += (x - M)*(x-oldM)
   return S/(N-1)




Commenti

Post popolari in questo blog

Research 6 - Derivation of Chebyshev's inequality and its application to prove the (weak) LLN

Research 7 -- Central Limit Theorem, LLN, and most common probability distributions