Application4

Generate by computer all possible samples of size N (user input) drown from the following population {27, 28, 25, 25, 25, 27, 29, 30, 30, 28}.
Represent, through histograms, the sampling distribution of the mean (aka the empirical mean).


Sample Mean (N = 3)



Sample Mean(N = 6)



Sample Mean (N = 8)






As the sample size increases and get closer to the population size, the sample mean will be more close to the arithmetic mean computed on the population itself. As the reader may have noticed, the resulting graph is not perfectly bell-shaped and this result is due to the population distribution.


The script has been developed using Python:

import matplotlib.pyplot as plt
import itertools
import numpy as np
import sys

if len(sys.argv) != 2:
print "you've to provide the sample size"
sys.exit(1)

# parse the population dataset into a list
N = int(sys.argv[1])
with open("pop.txt", 'r') as f:
pop = [ int(l.strip()) for l in f.readlines() ]

# compute and print the population size and arithmetic mean
popsize = len(pop)
popmean = np.mean(pop)
print "population: " + str(pop)
print 'population size: ' + str(popsize)
print "population's arithmetic mean: " + str(popmean) +"\n"

ress = []
sample_mean = []
legend = ["N="+str(N)]

combs = list(itertools.combinations(pop, N))
# pick all the samples (of size N) out of the population dataset
for c in combs:
sample_mean.append(np.mean(c))

ress.append(sample_mean)

plt.hist(ress, bins="auto", alpha=0.5)
#plt.xticks(range(25,31))
plt.legend(legend, loc='upper right', fontsize='x-small')

plt.tight_layout()
plt.show()



Commenti

Post popolari in questo blog

Welford Algorithm

Research 6 - Derivation of Chebyshev's inequality and its application to prove the (weak) LLN

Research 7 -- Central Limit Theorem, LLN, and most common probability distributions