
knitr::opts_chunk$set(comment = NA)
knitr::opts_knit$set(global.par = TRUE)
par(cex=0.8); options(scipen=20, digits=4, width=90)
if(!require(pacman)) install.packages("pacman")
pacman::p_load(magrittr, d3heatmap)


犯罪是一個國際關注的問題,但它在不同的國家以不同的方式記錄和處理。 在美國,聯邦調查局(FBI)記錄了暴力犯罪和財產犯罪。 此外,每個城市都記錄了犯罪行為,一些城市發布了有關犯罪率的數據。 伊利諾伊州芝加哥市從2001年開始在線發布犯罪數據。


Section-1 Loading the Data

【1.1】How many rows of data (observations) are in this dataset?

D = read.csv("data/mvtWeek1.csv", stringsAsFactors=F)
[1] 191641
[1] 11


       ID              Date           LocationDescription   Arrest         Domestic      
 Min.   :1310022   Length:191641      Length:191641       Mode :logical   Mode :logical  
 1st Qu.:2832144   Class :character   Class :character    FALSE:176105    FALSE:191226   
 Median :4762956   Mode  :character   Mode  :character    TRUE :15536     TRUE :415      
 Mean   :4968629                                                                         
 3rd Qu.:7201878                                                                         
 Max.   :9181151                                                                         
      Beat         District     CommunityArea        Year         Latitude   
 Min.   : 111   Min.   : 1      Min.   : 0      Min.   :2001   Min.   :41.6  
 1st Qu.: 722   1st Qu.: 6      1st Qu.:22      1st Qu.:2003   1st Qu.:41.8  
 Median :1121   Median :10      Median :32      Median :2006   Median :41.9  
 Mean   :1259   Mean   :12      Mean   :38      Mean   :2006   Mean   :41.8  
 3rd Qu.:1733   3rd Qu.:17      3rd Qu.:60      3rd Qu.:2009   3rd Qu.:41.9  
 Max.   :2535   Max.   :31      Max.   :77      Max.   :2012   Max.   :42.0  
                NA's   :43056   NA's   :24616                  NA's   :2276  
 Min.   :-87.9  
 1st Qu.:-87.7  
 Median :-87.7  
 Mean   :-87.7  
 3rd Qu.:-87.6  
 Max.   :-87.5  
 NA's   :2276   

類別(Factor) versus 字串(Character)

【1.2】How many variables are in this dataset?

[1] 11

【1.3】Using the “max” function, what is the maximum value of the variable “ID”?

[1] 9181151

【1.4】 What is the minimum value of the variable “Beat”?

[1] 111

【1.5】 How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

[1] 15536
[1] 0.08107

【1.6】 How many observations have a LocationDescription value of ALLEY?

sum(D$LocationDescription == "ALLEY")
[1] 2308


Section-2 Understanding Dates in R

【2.1】 In what format are the entries in the variable Date?

head(D$Date)  # Month/Day/Year Hour:Minute
[1] "12/31/12 23:15" "12/31/12 22:00" "12/31/12 22:00" "12/31/12 22:00" "12/31/12 21:30"
[6] "12/31/12 20:30"
ts = as.POSIXct(D$Date, format="%m/%d/%y %H:%M")



    0     1     2     3     4     5     6 
26316 27397 26791 27416 27319 29284 27118 

   01    02    03    04    05    06    07    08    09    10    11    12 
16047 13511 15758 15280 16035 16002 16801 16572 16060 17086 16063 16426 
table(weekday=format(ts,'%w'), month=format(ts,'%m'))
weekday   01   02   03   04   05   06   07   08   09   10   11   12
      0 2110 1837 2075 2070 2168 2239 2339 2304 2352 2424 2254 2144
      1 2395 1937 2200 2323 2359 2187 2457 2288 2258 2399 2323 2271
      2 2317 1885 2270 2118 2222 2183 2412 2251 2142 2416 2258 2317
      3 2259 2007 2242 2060 2345 2347 2408 2428 2239 2484 2182 2415
      4 2334 1904 2263 2099 2402 2190 2385 2464 2320 2280 2253 2425
      5 2392 2036 2443 2388 2340 2566 2459 2591 2390 2692 2475 2512
      6 2240 1905 2265 2222 2199 2290 2341 2246 2359 2391 2318 2342


table(format(ts,"%u"), format(ts,"%H")) %>% 
  as.data.frame.matrix %>% 

【2.2】 What is the month and year of the median date in our dataset?

[1] "2006-05-21 12:30:00 CST"

【2.3】 In which month did the fewest motor vehicle thefts occur?


   02    04    03    06    05    01    09    11    12    08    07    10 
13511 15280 15758 16002 16035 16047 16060 16063 16426 16572 16801 17086 

【2.4】 On which weekday did the most motor vehicle thefts occur?

format(ts,"%w") %>% table %>% sort
    0     2     6     4     1     3     5 
26316 26791 27118 27319 27397 27416 29284 

【2.5】 Which month has the largest number of motor vehicle thefts for which an arrest was made?

ts[D$Arrest] %>% format('%m') %>% table %>% sort
  05   06   02   09   04   11   03   07   08   10   12   01 
1187 1230 1238 1248 1252 1256 1298 1324 1329 1342 1397 1435