💡 學習重點:
■ 尺度縮減的基本觀念
■ 主成分分析:Principle Component Analysis (PCA) ■ 主成分 Priciple Components?
■ 特徵值 Eiganvalue & Variance Decomposition
■ 主成分分析的應用
■ 主成分分析和集群分析的綜合應用
pacman::p_load(dplyr, FactoMineR, factoextra)
D = decathlon2
head(D)
X100m Long.jump Shot.put High.jump X400m X110m.hurdle Discus
SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 43.75
CLAY 10.76 7.40 14.26 1.86 49.37 14.05 50.72
BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 40.87
YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 46.26
ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17 45.67
McMULLEN 10.83 7.31 13.76 2.13 49.91 14.38 44.41
Pole.vault Javeline X1500m Rank Points Competition
SEBRLE 5.02 63.19 291.7 1 8217 Decastar
CLAY 4.92 60.15 301.5 2 8122 Decastar
BERNARD 5.32 62.77 280.1 4 8067 Decastar
YURKOV 4.72 63.44 276.4 5 8036 Decastar
ZSIVOCZKY 4.42 55.37 268.0 7 8004 Decastar
McMULLEN 4.42 56.37 285.1 8 7995 Decastar
我們使用FactoMineR
套件的加強功能PCA()
,通常用預設參數就行
pca = PCA(D[,1:10])
做完分析,它自動會把所有的「個體」和「變數」投射到前兩個「主成分」的平面上。
pca
物件的內容PCA()
會回傳一個PCA
物件,我們叫它pca
pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 27 individuals, described by 10 variables
*The results are available in the following objects:
name description
1 "$eig" "eigenvalues"
2 "$var" "results for the variables"
3 "$var$coord" "coord. for the variables"
4 "$var$cor" "correlations variables - dimensions"
5 "$var$cos2" "cos2 for the variables"
6 "$var$contrib" "contributions of the variables"
7 "$ind" "results for the individuals"
8 "$ind$coord" "coord. for the individuals"
9 "$ind$cos2" "cos2 for the individuals"
10 "$ind$contrib" "contributions of the individuals"
11 "$call" "summary statistics"
12 "$call$centre" "mean of the variables"
13 "$call$ecart.type" "standard error of the variables"
14 "$call$row.w" "weights for the individuals"
15 "$call$col.w" "weights for the variables"
pca$eig
: 各主成分的資訊含量get_eigenvalue(pca)
eigenvalue variance.percent cumulative.variance.percent
Dim.1 3.74997 37.4997 37.500
Dim.2 1.74517 17.4517 54.951
Dim.3 1.51783 15.1783 70.130
Dim.4 1.03220 10.3220 80.452
Dim.5 0.61784 6.1784 86.630
Dim.6 0.42829 4.2829 90.913
Dim.7 0.32591 3.2591 94.172
Dim.8 0.27938 2.7938 96.966
Dim.9 0.19111 1.9111 98.877
Dim.10 0.11230 1.1230 100.000
pca$var$coord
: 各變數在各尺度的座標pca$var$coord
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
X100m -0.818952 0.342779 0.1008645 0.101342 -0.21981
Long.jump 0.758899 -0.381493 -0.0062613 -0.185424 0.26371
Shot.put 0.715078 0.282117 0.4738546 0.036104 -0.27864
High.jump 0.608493 0.611354 0.0046060 0.071244 0.30059
X400m -0.643848 0.148422 0.5157594 0.269785 0.19924
X110m.hurdle -0.716420 0.297552 0.4164510 -0.159781 0.16102
Discus 0.716888 0.204398 0.2703222 0.397623 -0.33949
Pole.vault -0.221417 -0.737548 0.4030836 -0.251549 -0.26259
Javeline 0.355176 0.098531 0.6954337 -0.485559 0.13342
X1500m 0.069712 -0.568120 0.3527578 0.652461 0.25368
pca$var$coord
: 各變數在各尺度呈現的資訊比率pca$var$cos2
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
X100m 0.6706825 0.1174973 0.010173655 0.0102702 0.048315
Long.jump 0.5759270 0.1455370 0.000039203 0.0343821 0.069545
Shot.put 0.5113370 0.0795898 0.224538173 0.0013035 0.077639
High.jump 0.3702640 0.3737540 0.000021215 0.0050756 0.090352
X400m 0.4145404 0.0220292 0.266007740 0.0727841 0.039696
X110m.hurdle 0.5132580 0.0885371 0.173431450 0.0255299 0.025928
Discus 0.5139286 0.0417785 0.073074093 0.1581041 0.115255
Pole.vault 0.0490256 0.5439768 0.162476407 0.0632771 0.068955
Javeline 0.1261497 0.0097083 0.483627983 0.2357675 0.017802
X1500m 0.0048598 0.3227600 0.124438033 0.4257058 0.064353
fviz_pca_var(pca)
pca$ind$coord
: 個體在各尺度的座標pca$ind$coord
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
SEBRLE 0.277958 -0.536434 1.585239 0.105823 1.074623
CLAY 0.904854 -2.094280 0.840685 1.850718 -0.408645
BERNARD -1.372266 -1.348116 0.961932 -1.493072 -0.182667
YURKOV -0.928205 2.281744 1.942688 0.096823 0.190927
ZSIVOCZKY -0.103817 1.089822 -2.098908 0.071906 -0.032938
McMULLEN 0.239858 0.939092 -0.818136 1.201893 1.830199
MARTINEAU -2.537291 1.801094 0.051975 0.374306 -2.285411
HERNU -1.902843 -0.330277 1.288682 0.766505 0.239465
BARRAS -1.805625 0.302590 -0.592810 0.656526 -0.244039
NOOL -2.881737 0.863854 -1.402448 -1.491195 1.358726
BOURGUIGNON -4.505530 -0.485422 1.202704 0.951363 0.508865
Sebrle 3.567756 0.068007 1.911216 -1.042363 -0.300596
Clay 3.472177 -0.705599 1.607029 -0.696108 0.741815
Karpov 4.328761 0.160789 -1.152529 0.407689 -0.772485
Macey 1.944475 2.523948 -0.260304 -0.079809 -0.025024
Warners 1.552082 -1.488634 -1.414196 -0.549665 0.102190
Zsivoczky 0.475153 1.971763 0.900183 -0.725288 0.171696
Hernu 0.280841 0.822696 -0.905794 -0.782389 -0.771389
Bernard 1.533280 1.085832 -1.245717 0.534722 1.042879
Schwarzl -0.677974 -1.134257 -0.422180 -0.609851 -0.100686
Pogorelov -0.077879 -0.333658 0.607951 1.446999 0.204623
Schoenbeck -0.487405 -0.860688 0.866712 -0.173226 -0.432985
Barras -0.413081 1.366893 0.227296 -0.753324 -0.861778
KARPOV 0.967748 -0.995599 -0.476019 2.501069 -0.661349
WARNERS -0.280043 -0.912158 -1.416982 -0.147161 0.036162
Nool -0.535389 -2.135638 0.616053 -1.808079 -0.319954
Drews -1.035856 -1.917364 -2.404320 -0.614812 -0.102226
pca$ind$coord
: 個體在各尺度呈現的資訊比率pca$ind$cos2
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
SEBRLE 0.0154465 0.05753138 0.50241310 0.00223886 0.230878821
CLAY 0.0655742 0.35127414 0.05660346 0.27431969 0.013374224
BERNARD 0.2322366 0.22413434 0.11411498 0.27492584 0.004115041
YURKOV 0.0848122 0.51251263 0.37151534 0.00092284 0.003588447
ZSIVOCZKY 0.0015087 0.16625497 0.61666612 0.00072377 0.000151861
McMULLEN 0.0082682 0.12674108 0.09619490 0.20760230 0.481390643
MARTINEAU 0.4048095 0.20397763 0.00016986 0.00880975 0.328426736
HERNU 0.3626087 0.01092418 0.16631227 0.05883861 0.005742729
BARRAS 0.6179591 0.01735462 0.06660942 0.08169748 0.011288151
NOOL 0.5144396 0.04622816 0.12184266 0.13775107 0.114364103
BOURGUIGNON 0.8414496 0.00976733 0.05995894 0.03751710 0.010733484
Sebrle 0.6742810 0.00024499 0.19349512 0.05755578 0.004786484
Clay 0.6864011 0.02834588 0.14703531 0.02758845 0.031330374
Karpov 0.8267406 0.00114065 0.05860654 0.00733332 0.026328251
Macey 0.3497653 0.58929505 0.00626806 0.00058922 0.000057928
Warners 0.3365404 0.30958768 0.27940052 0.04220890 0.001458908
Zsivoczky 0.0354688 0.61078646 0.12730376 0.08264196 0.004631289
Hernu 0.0247329 0.21224209 0.25728350 0.19195436 0.186594955
Bernard 0.3368420 0.16893075 0.22234247 0.04096756 0.155830032
Schwarzl 0.1319041 0.36919405 0.05114794 0.10672826 0.002909200
Pogorelov 0.0010328 0.01895803 0.06294013 0.35655465 0.007130132
Schoenbeck 0.0643656 0.20070808 0.20352752 0.00813015 0.050794887
Barras 0.0272368 0.29823283 0.00824647 0.09058360 0.118543191
KARPOV 0.0868773 0.09194984 0.02101993 0.58027427 0.040573584
WARNERS 0.0166843 0.17700968 0.42715467 0.00460727 0.000278197
Nool 0.0312477 0.49720414 0.04137291 0.35638092 0.011159741
Drews 0.0963567 0.33013525 0.51911955 0.03394434 0.000938440
fviz_pca_ind(pca)
fviz_pca_biplot(
pca, pointsize="cos2", repel=T,
col.var="red", col.ind="#E7B800", alpha.ind=0.3)
kmg = kmeans(D[,1:10],3)$cluster %>% factor
table(kmg)
kmg
1 2 3
9 5 13
fviz_pca_biplot(
pca, repel=T, col.var="black",
col.ind=kmg, alpha.ind=0.6, pointshape=16,
addEllipses = TRUE, ellipse.level = 0.6, mean.point = FALSE)
💡 FactoMineR
和factoextra
這兩個套件非常的強大,除了連續變數之外,它們也可以做類別變數、甚至於混合變數的主成分分析;他們的繪圖功能也非常靈活,除了投射本身的變數和個體之外,區隔變數以外的連續或類別變數,或者是不在原資料之中的新資料點,都可以投射到主成分空間裡面。