產生一個長度為10的類別向量(SkinColor
)
💡: 向量本身並不是隨機變數,但我們可以用 「從向量中隨機抽取一個值」這一個「實驗」來定義一個「隨機變數」。
隨機抽出一個長度為20點的結果向量(x
)
[1] "白" "白" "白" "黃" "白" "紅" "白" "紅" "黃" "紅" "白" "紅" "黑" "紅" "紅"
[16] "黃" "紅" "黑" "紅" "白"
分佈有兩種表示方式:頻率(次數)、比率(機率)
par(mfrow=c(1,2), mar=c(2,5,3,1), cex=0.8)
table(SkinColor) %>% barplot(main='SkinColor,Freq')
table(SkinColor) %>% prop.table %>% barplot(main='SkinColor,Prop')
隨機抽樣會產生不同的結果,結果向量之中的分佈可能與原始向量不同
par(mfrow=c(1,4), mar=c(2,3,3,1), cex=0.75)
table(SkinColor) %>% prop.table %>% barplot(main='SkinColor')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x1')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x2')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x3')
💡: 結果向量的比率未必能直接代表原始族群的比率!
[,1] [,2] [,3] [,4] [,5] [,6]
白 0.3 0.45 0.406 0.3950 0.39846 0.39963
紅 0.3 0.25 0.287 0.3035 0.29879 0.29967
黃 0.2 0.20 0.196 0.1985 0.20193 0.20067
黑 0.2 0.10 0.111 0.1030 0.10082 0.10003
💡: 當抽樣的次數很大時,結果向量的比率會貼近原始族群的比率!
從常態分佈產生一個長度為100的結果向量:Weight
set.seed(2) # set.seed for randomization
Weight = rnorm(100,mean=60,sd=5) # 100 random samples
Weight
[1] 55.515 60.924 67.939 54.348 59.599 60.662 63.540 58.802 69.922 59.306
[11] 62.088 64.909 58.037 54.802 68.911 48.445 64.393 60.179 65.064 62.161
[21] 70.454 54.000 67.948 69.773 60.025 47.741 62.386 57.017 63.961 61.448
[31] 63.695 61.595 65.381 58.579 56.117 57.022 51.370 55.487 57.205 58.767
[41] 58.082 50.204 55.791 69.518 63.112 69.955 58.473 59.546 59.079 54.006
[51] 55.809 70.332 57.189 66.379 54.762 50.171 58.385 64.679 65.696 68.358
[61] 51.059 70.156 56.484 60.791 62.531 55.900 50.006 57.604 60.421 55.523
[71] 55.394 61.652 59.292 62.174 59.731 55.464 66.518 63.859 65.263 52.950
[81] 64.980 51.521 57.333 53.139 48.960 69.111 56.733 58.577 58.065 61.933
[91] 68.002 68.406 54.082 53.208 52.437 53.734 69.797 60.038 55.787 56.994
連續變數的分佈也分佈有三種表示方式:頻率(次數)、比率(機率)
par(mfrow=c(1,3), mar=c(2,5,2,1), cex=0.7, bty='n')
hist(Weight, main='Histogram, Freq')
hist(Weight, main='Histogram, Density', freq=F)
plot(density(Weight),main='Prob. Dist. Function (pdf)',ylim=c(0,0.08))
curve(dnorm(x,60,5),0,100,col='red',add=T)
其實隨機變數的「種類」不只兩種
par(mfrow=c(1,4), mar=c(2,3,3,1), cex=0.75)
Color = rep(c('綠','黑','褐'), c(100,200,300)) # categorial, non-order
table(Color) %>% barplot(main="顏色")
Size = rep(c('大','中','小'), c(150,200,250)) # categorial, order
table(Size) %>% barplot(main="大小")
Freq = rpois(600,2.5) # numeric, discrete
table(Freq) %>% barplot(main= "次數")
Weight = rnorm(600,50,15) # numeric, contineous
hist(Weight, main= "重量")
🍭 練習:
以下這一段簡單的程式碼就可以分別畫出「類別」與「數值」變數的「頻率(次數)」與「機率」分佈;你可以在雙引號之間補進正確的標題,如:“類別變數的次數分布”、“頻率”、“機率”、“密度”等等,讓圖形更加完整,更能幫助同學們分辨這四種不同的分佈嗎?