UNIT05A：機率概論

pacman::p_load(magrittr)

【A】以向量為基礎的『實證』隨機變數

在向量上定義隨機變數

產生一個長度為10的類別向量(SkinColor)

SkinColor = c("白","白","白","白","紅","紅","紅","黃","黃","黑")

💡：向量本身並不是隨機變數，但我們可以用「從向量中隨機抽取一個值」這一個「實驗」來定義一個「隨機變數」。

隨機抽樣

隨機抽出一個長度為20點的結果向量(x)

x = sample(SkinColor, size=20, replace=T); x

 [1] "白" "白" "白" "黃" "白" "紅" "白" "紅" "黃" "紅" "白" "紅" "黑" "紅" "紅"
[16] "黃" "紅" "黑" "紅" "白"

分佈：(隨機)變數的『值』所出現的『頻率』

分佈有兩種表示方式：頻率(次數)、比率(機率)

par(mfrow=c(1,2), mar=c(2,5,3,1), cex=0.8)
table(SkinColor) %>% barplot(main='SkinColor,Freq')
table(SkinColor) %>% prop.table %>% barplot(main='SkinColor,Prop')

結果向量的分佈

隨機抽樣會產生不同的結果，結果向量之中的分佈可能與原始向量不同

par(mfrow=c(1,4), mar=c(2,3,3,1), cex=0.75)
table(SkinColor) %>% prop.table %>% barplot(main='SkinColor')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x1')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x2')
table( sample(SkinColor,20,T) ) %>% prop.table %>% barplot(main='x3')

💡：結果向量的比率未必能直接代表原始族群的比率！

set.seed(2)
sapply(1:6, function(n) {
  sample(SkinColor,10^n,T) %>% table %>% prop.table
  })

   [,1] [,2]  [,3]   [,4]    [,5]    [,6]
白  0.3 0.45 0.406 0.3950 0.39846 0.39963
紅  0.3 0.25 0.287 0.3035 0.29879 0.29967
黃  0.2 0.20 0.196 0.1985 0.20193 0.20067
黑  0.2 0.10 0.111 0.1030 0.10082 0.10003

💡：當抽樣的次數很大時，結果向量的比率會貼近原始族群的比率！

【B】定義在理論分佈上的『理論』隨機變數

定義不同「種類」的隨機變數

從常態分佈產生一個長度為100的結果向量：Weight

set.seed(2)                         # set.seed for randomization
Weight = rnorm(100,mean=60,sd=5)    # 100 random samples
Weight

  [1] 55.515 60.924 67.939 54.348 59.599 60.662 63.540 58.802 69.922 59.306
 [11] 62.088 64.909 58.037 54.802 68.911 48.445 64.393 60.179 65.064 62.161
 [21] 70.454 54.000 67.948 69.773 60.025 47.741 62.386 57.017 63.961 61.448
 [31] 63.695 61.595 65.381 58.579 56.117 57.022 51.370 55.487 57.205 58.767
 [41] 58.082 50.204 55.791 69.518 63.112 69.955 58.473 59.546 59.079 54.006
 [51] 55.809 70.332 57.189 66.379 54.762 50.171 58.385 64.679 65.696 68.358
 [61] 51.059 70.156 56.484 60.791 62.531 55.900 50.006 57.604 60.421 55.523
 [71] 55.394 61.652 59.292 62.174 59.731 55.464 66.518 63.859 65.263 52.950
 [81] 64.980 51.521 57.333 53.139 48.960 69.111 56.733 58.577 58.065 61.933
 [91] 68.002 68.406 54.082 53.208 52.437 53.734 69.797 60.038 55.787 56.994

連續變數的分佈

連續變數的分佈也分佈有三種表示方式：頻率(次數)、比率(機率)

par(mfrow=c(1,3), mar=c(2,5,2,1), cex=0.7, bty='n')
hist(Weight, main='Histogram, Freq')
hist(Weight, main='Histogram, Density', freq=F)
plot(density(Weight),main='Prob. Dist. Function (pdf)',ylim=c(0,0.08))
curve(dnorm(x,60,5),0,100,col='red',add=T)

四種不同「種類」的隨機變數

其實隨機變數的「種類」不只兩種

par(mfrow=c(1,4), mar=c(2,3,3,1), cex=0.75)
Color = rep(c('綠','黑','褐'), c(100,200,300)) # categorial, non-order
table(Color) %>% barplot(main="顏色")
Size = rep(c('大','中','小'), c(150,200,250))  # categorial, order
table(Size) %>% barplot(main="大小")
Freq = rpois(600,2.5)                   # numeric, discrete   
table(Freq) %>% barplot(main= "次數")
Weight = rnorm(600,50,15)               # numeric, contineous 
hist(Weight, main= "重量")

🍭 練習：
以下這一段簡單的程式碼就可以分別畫出「類別」與「數值」變數的「頻率(次數)」與「機率」分佈；你可以在雙引號之間補進正確的標題，如：“類別變數的次數分布”、“頻率”、“機率”、“密度”等等，讓圖形更加完整，更能幫助同學們分辨這四種不同的分佈嗎？

par(mfrow=c(1,4), mar=c(2,5,3,1), cex=0.75)
table(SkinColor) %>% barplot(main="類別的數量分布", ylab="頻率", xlab="SkinColor")
table(SkinColor) %>% prop.table %>% barplot(main="?", ylab="?", xlab="SkinColor")
hist(Weight, main="?", ylab="?")
hist(Weight, freq=F, main="?", ylab="?")

UNIT05A：機率概論

中山大學管理學院卓雍然

2019-10-10 13:38:14

【A】以向量為基礎的『實證』隨機變數

在向量上定義隨機變數

隨機抽樣

分佈：(隨機)變數的『值』所出現的『頻率』

結果向量的分佈

【B】定義在理論分佈上的『理論』隨機變數

定義不同「種類」的隨機變數

連續變數的分佈

四種不同「種類」的隨機變數

UNIT05A：機率概論

中山大學管理學院 卓雍然

2019-10-10 13:38:14

【A】以向量為基礎的『實證』隨機變數

在向量上定義隨機變數

隨機抽樣

分佈：(隨機)變數的『值』所出現的『頻率』

結果向量的分佈

【B】定義在理論分佈上的『理論』隨機變數

定義不同「種類」的隨機變數

連續變數的分佈

四種不同「種類」的隨機變數

中山大學管理學院卓雍然