When \(n\) is large, \(Binom[n, p]\) approaches \(Norm[\mu = n p, \sigma=\sqrt{n p (1-p)}]\)
\(X \sim Binom[n, p] \, \Rightarrow \, Exp(X) = n \cdot p \, , \, Var(X) = n \cdot p \cdot (1-p)\)
par(mfrow=c(1,1), mar=c(3,4,3,1), cex=0.7)
n = 1000; p = 0.2
rbinom(500000, n, p) %>% hist(breaks=80, freq=F, main="")
curve(dnorm(x, mean=n*p, sd=sqrt(n*p*(1-p))), col='red', lwd=2, add=T)
par(mfrow=c(1,2), cex=0.7)
n = 10; p = 0.2
rbinom(100000, n, p) %>% hist(freq=F, breaks=(0:n)-0.01)
rnorm(100000, n*p, sqrt(n*p*(1-p))) %>% hist(freq=F)
💡 : 當期望值夠大的時候, 二項分佈會以期望值為中心向兩邊對稱的伸展,但是如果期望值不夠大的話,這個分佈的左尾就會受到擠壓,變成一個不對稱的分佈。
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.99999 1.9986 3.0031 3.9977 5.0017 6.0002 7.0010 7.9968 9.0017 9.9998
[2,] 1.00139 2.0010 3.0031 3.9900 5.0201 5.9911 7.0112 7.9797 9.0152 9.9987
par(mfrow=c(1,2), cex=0.7)
(rpois(100000, 1) + rpois(100000, 2)) %>% table %>% barplot(main="Pois[1] + Pois[2]")
rpois(100000, 3) %>% table %>% barplot(main="Pois[3]")
[1] 0.0671908 0.2162406 0.3482258 0.3672794 0.2721720 0.1293956
[7] 0.0056023 -0.0678584 -0.0925057 -0.0854730 -0.0648806
我們可以用二項分佈來模擬Geometric Dist.
par(mfrow=c(1,2), mar=c(3,3,3,1), cex=0.7)
replicate(100000, which(rbinom(100, 1, .3) == 1)[1] - 1) %>%
table %>% barplot(main="Binomial Simulation")
rgeom(100000, 0.3) %>% table %>% barplot(main="Geometric")
🗿 : 如果有一台機器每一天壞掉的機率是0.05,那麼在20天之內,它還能正常工作的機率分別是多少呢?
p = 0.05
data.frame(end.of.day=1:20) %>% mutate(
brokenProb = pgeom(end.of.day-1, p),
workingProb = 1 - brokenProb
)
end.of.day brokenProb workingProb
1 1 0.05000 0.95000
2 2 0.09750 0.90250
3 3 0.14263 0.85737
4 4 0.18549 0.81451
5 5 0.22622 0.77378
6 6 0.26491 0.73509
7 7 0.30166 0.69834
8 8 0.33658 0.66342
9 9 0.36975 0.63025
10 10 0.40126 0.59874
11 11 0.43120 0.56880
12 12 0.45964 0.54036
13 13 0.48666 0.51334
14 14 0.51233 0.48767
15 15 0.53671 0.46329
16 16 0.55987 0.44013
17 17 0.58188 0.41812
18 18 0.60279 0.39721
19 19 0.62265 0.37735
20 20 0.64151 0.35849
🗿 : 如果平均而言每一個捐贈者有我需要的器官的機率是5%,那麼平均我要等多少個捐贈者才能等到我想要用的器官呢?