MemberID InpatientDays ERVisits OfficeVisits Narcotics
Min. : 1.0 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 33.5 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 7.0 1st Qu.: 0.00
Median : 66.0 Median : 0.00 Median : 1.0 Median :12.0 Median : 1.00
Mean : 66.0 Mean : 2.72 Mean : 1.5 Mean :13.2 Mean : 4.57
3rd Qu.: 98.5 3rd Qu.: 3.00 3rd Qu.: 2.0 3rd Qu.:18.5 3rd Qu.: 3.00
Max. :131.0 Max. :30.00 Max. :11.0 Max. :46.0 Max. :59.00
DaysSinceLastERVisit Pain TotalVisits ProviderCount
Min. : 6 Min. : 0.0 Min. : 0.0 Min. : 5
1st Qu.:207 1st Qu.: 1.0 1st Qu.: 8.0 1st Qu.:15
Median :641 Median : 8.0 Median :15.0 Median :20
Mean :481 Mean : 15.6 Mean :17.4 Mean :24
3rd Qu.:731 3rd Qu.: 23.0 3rd Qu.:22.5 3rd Qu.:30
Max. :731 Max. :104.0 Max. :69.0 Max. :82
MedicalClaims ClaimLines StartedOnCombination AcuteDrugGapSmall
Min. : 11.0 Min. : 20.0 Mode :logical Min. : 0.00
1st Qu.: 25.5 1st Qu.: 83.5 FALSE:125 1st Qu.: 0.00
Median : 37.0 Median :120.0 TRUE :6 Median : 1.00
Mean : 43.2 Mean :142.9 Mean : 2.69
3rd Qu.: 49.5 3rd Qu.:185.0 3rd Qu.: 3.00
Max. :194.0 Max. :577.0 Max. :71.00
PoorCare
Min. :0.000
1st Qu.:0.000
Median :0.000
Mean :0.252
3rd Qu.:0.500
Max. :1.000
0 1
0.74809 0.25191
set.seed(88)
split = sample.split(D$PoorCare, SplitRatio = 0.7) # split vector
table(split) %>% prop.table()
split
FALSE TRUE
0.29771 0.70229
split
FALSE TRUE
0 0.74359 0.75000
1 0.25641 0.25000
Call:
glm(formula = PoorCare ~ OfficeVisits + Narcotics, family = binomial,
data = TR)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.878 -0.635 -0.517 -0.224 2.146
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.5662 0.5249 -4.89 0.000001 ***
OfficeVisits 0.0814 0.0310 2.62 0.0087 **
Narcotics 0.0620 0.0320 1.94 0.0527 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 103.470 on 91 degrees of freedom
Residual deviance: 84.528 on 89 degrees of freedom
AIC: 90.53
Number of Fisher Scoring iterations: 4
模型:
Pr[y = 1] = 1/(1+exp(-f(x)))
\(Logit = f(x) = b_0 + b_1 x_1 + b_2 x_2 \;\; (1)\)
\(Logit = f(x) = -2.6461 + 0.0821 \times OfficeVisits + 0.0763 \times Narcotics \;\; (2)\)
係數:
\(Odd_0 = Exp(b_0 + b_1 x_1)\;\;(3)\)
\(Odd_1 = Exp[b_0 + b_1(x_1+1)] = Exp(b_0 + b_1 x_1 + b_1) = Exp(b_0 + b_1 x_1) \times Exp(b_1) \;\;(4)\)
\(Odd_1 = Odd_0 \times Exp(b_1) \:\:(5)\)
\(\frac{Odd_1}{Odd_0} = Exp(b_1) \:\:(6)\)
機率和勝率之間的關係
pop = function(p, k) {o = p/(1-p); o = k * o; o/(1+o)}
p0 = seq(0.1, 0.9, 0.1); k = 2
p1 = sapply(seq(0.1, 0.9, 0.1), pop, k)
data.frame(k, p0, p1, difference=p1-p0, multiplier=p1/p0) %>% round(2)
k p0 p1 difference multiplier
1 2 0.1 0.18 0.08 1.82
2 2 0.2 0.33 0.13 1.67
3 2 0.3 0.46 0.16 1.54
4 2 0.4 0.57 0.17 1.43
5 2 0.5 0.67 0.17 1.33
6 2 0.6 0.75 0.15 1.25
7 2 0.7 0.82 0.12 1.18
8 2 0.8 0.89 0.09 1.11
9 2 0.9 0.95 0.05 1.05
💡 相同的「勝率比(\(k=2\))」在不同的機率(0.1~0.9
)下會換算出不同的「機率差(5%~17%
)」
變數的邊際效果 當OfficeVisits
和Narcotics
分別都等於它們的中位數時,\(y=1\)的機率為
df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics))
predict(glm1, df, type="response")
1
0.17831
OfficeVisits
的邊際勝率效果是:exp(0.0814)=1.0848
[1] 1.0848
OfficeVisits
的邊際機率效果是:0.19054-0.17831=0.01223
df = data.frame(OfficeVisits = median(D$OfficeVisits)+1, Narcotics=median(D$Narcotics))
predict(glm1, df, type="response")
1
0.19054
df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics)+1)
predict(glm1, df, type="response")
1
0.18757
df = data.frame(OfficeVisits = median(D$OfficeVisits)+1, Narcotics=median(D$Narcotics)+1)
predict(glm1, df, type="response")
1
0.20029
💡 學習重點:
■ 係數的指數就是勝率比;也就是說,\(x_i\) 每增加\(1\),\(y=1\)的勝率會變成原來的 \(Exp(b_i)\) 倍
■ 各預測變數的(勝率)效果是相乘,而不是相加
■ 機率和勝率之間的關係並不是線性的:
■ 邏輯式回歸裡面各預測變數的勝率效果是固定的
■ 但是他們的機率效果並不是固定的
■ 我們需先推算原先的機率,才能推算變數的機率效果
0% 25% 50% 75% 100%
0.0 7.0 12.0 18.5 46.0
0% 25% 50% 75% 100%
0 0 1 3 59
🗿 練習:
■ 當OfficeVisits
和Narcotic
分別等於他們的第一分位(Q1
)時:
■ PoorCare = 1
的機率是?
■ 兩個自變數的邊際勝率效果分別是?
■ 兩個自變數的邊際機率效果分別是?
■ 當OfficeVisits
和Narcotic
分別等於他們的第三分位(Q3
)時:
■ PoorCare = 1
的機率是?
■ 兩個自變數的邊際勝率效果分別是?
■ 兩個自變數的邊際機率效果分別是?
■ 比較以上兩個題目的答案,我們可以觀察到什麼?