基本的程式筆記設定,安裝、載入一些基本的套件
rm(list=ls(all=T))
knitr::opts_chunk$set(comment = NA)
knitr::opts_knit$set(global.par = TRUE)
par(cex=0.8); options(scipen=20, digits=4, width=90)
if(!require(pacman)) install.packages("pacman")
pacman::p_load(magrittr)
以上這些程式碼請大家不要去改動
股票動態
股票市場是買賣雙方交易公司股票的地方,也是個人和公司投資的最受歡迎的方式之一。現在估計世界股票市場規模達到數兆。紐約證券交易所位於紐約市,是世界上最大的股票市場。 紐約證券交易所約有2,800家上市公司。在這個問題上,我們將看看其中五家公司的每月股價:IB、通用電氣(GE)、寶潔、可口可樂和波音。此問題中使用的數據來自Infochimps。
使用read.csv()
下載並讀取以下文件:
data/IBMStock.csv
data/GEStock.csv
data/ProcterGambleStock.csv
data/CocaColaStock.csv
data/BoeingStock.csv
分別存入資料框IBM
、GE
、ProcterGamble
、CocaCola
和Boeing
, 每個資料框都有兩個變量,描述如下:
Date
:日期StockPrice
:給定月份公司的平均股價在這個案例,我們將看看這些公司的股票動態如何隨著時間的推移而發生變化。
§ 1.1 Our five datasets all have the same number of observations. How many observations are there in each data set?
IBM = read.csv('data/IBMstock.csv')
GE = read.csv('data/GEstock.csv')
PG = read.csv('data/ProcterGambleStock.csv')
CO = read.csv('data/CocaColastock.csv')
BOE = read.csv('data/Boeingstock.csv')
§ 1.2 What is the earliest year in our datasets?
IBM$Date = as.character(IBM$Date) %>% as.Date('%m/%d/%y')
GE$Date = as.character(GE$Date) %>% as.Date('%m/%d/%y')
PG$Date = as.character(PG$Date) %>% as.Date('%m/%d/%y')
CO$Date = as.character(CO$Date) %>% as.Date('%m/%d/%y')
BOE$Date = as.character(BOE$Date) %>% as.Date('%m/%d/%y')
min(IBM$Date)
[1] "1970-01-01"
§ 1.3 What is the latest year in our datasets?
max(IBM$Date)
[1] "2009-12-01"
💡 資料框的序列 a list of data.frame
L = list(
IBM = read.csv('data/IBMstock.csv'),
GE = read.csv('data/GEstock.csv'),
PG = read.csv('data/ProcterGambleStock.csv'),
CO = read.csv('data/CocaColastock.csv'),
BOE = read.csv('data/Boeingstock.csv'))
for(i in 1:length(L)) {
L[[i]]$Date = as.character(L[[i]]$Date) %>% as.Date('%m/%d/%y')
}
lapply(L, function(df) range(df$Date))
$IBM
[1] "1970-01-01" "2009-12-01"
$GE
[1] "1970-01-01" "2009-12-01"
$PG
[1] "1970-01-01" "2009-12-01"
$CO
[1] "1970-01-01" "2009-12-01"
$BOE
[1] "1970-01-01" "2009-12-01"
§ 1.4 What is the mean stock price of IBM over this time period?
#
#
§ 1.5 What is the minimum stock price of General Electric (GE) over this time period?
#
#
§ 1.6 What is the maximum stock price of Coca-Cola over this time period?
#
#
§ 1.7 What is the median stock price of Boeing over this time period?
#
#
§ 1.8 What is the standard deviation of the stock price of Procter & Gamble over this time period?
#
#
💡 內建的統計量功能:mean()
, median()
, sd()
, min()
, max()
, range()
, summary()
§ 2.1 Around what year did Coca-Cola has its highest stock price in this time period? Around what year did Coca-Cola has its lowest stock price in this time period?
plot(CO$Date, CO$StockPrice, type='l')
§ 2.2 In March of 2000, the technology bubble burst, and a stock market crash occurred. According to this plot, which company’s stock dropped more?
plot(IBM$Date, IBM$StockPrice, type='l', col='orange')
lines(GE$Date, GE$StockPrice, type='l', col='cyan')
lines(BOE$Date, BOE$StockPrice, type='l', col='pink')
lines(CO$Date, CO$StockPrice, type='l', col='green')
lines(PG$Date, PG$StockPrice, type='l', col='blue')
abline(v=as.Date(c("2000-03-01","1983-01-01","1984-01-01")),col='gray',lty=3)
§ 2.3 (a) Around 1983, the stock for one of these companies (Coca-Cola or Procter and Gamble) was going up, while the other was going down. Which one was going up?
#
#
#
#
§ 3.1 Which stock fell the most right after the technology bubble burst in March 2000?
#
#
§ 3.2 Which stock reaches the highest value in the time period 1995-2005?
plot(IBM$Date, IBM$StockPrice, type='l', col='orange',
xlim=as.Date(c('1995-01-01','2005-12-01')), ylim=c(0,250))
lines(GE$Date, GE$StockPrice, type='l', col='cyan')
lines(BOE$Date, BOE$StockPrice, type='l', col='pink')
lines(CO$Date, CO$StockPrice, type='l', col='green')
lines(PG$Date, PG$StockPrice, type='l', col='blue')
§ 3.3 In October of 1997, there was a global stock market crash that was caused by an economic crisis in Asia. Comparing September 1997 to November 1997, which companies saw a decreasing trend in their stock price? (Select all that apply.)
IBM$StockPrice[ IBM$Date %in% as.Date(c("1997-09-01", "1997-11-01")) ]
[1] 101.5 102.2
sapply(L, function(df){
df$StockPrice[ df$Date %in% as.Date(c("1997-09-01", "1997-11-01")) ]
})
IBM GE PG CO BOE
[1,] 101.5 67.63 114.1 59.31 54.10
[2,] 102.2 69.56 73.4 59.40 48.34
§ 3.4 In the last two years of this time period (2004 and 2005) which stock seems to be performing the best, in terms of increasing stock price?
plot(IBM$Date, IBM$StockPrice, type='l', col='orange',
xlim=as.Date(c('2004-01-01','2005-12-01')), ylim=c(0,120))
lines(GE$Date, GE$StockPrice, type='l', col='cyan')
lines(BOE$Date, BOE$StockPrice, type='l', col='pink')
lines(CO$Date, CO$StockPrice, type='l', col='green')
lines(PG$Date, PG$StockPrice, type='l', col='blue')
§ 4.1 For IBM, compare the monthly averages to the overall average stock price. In which months has IBM historically had a higher stock price (on average)? Select all that apply.
tapply(IBM$StockPrice, format(IBM$Date,'%m'), mean)
01 02 03 04 05 06 07 08 09 10 11 12
150.2 152.7 152.4 152.1 151.5 139.1 139.1 140.1 139.1 137.3 138.0 140.8
§ 4.2 General Electric and Coca-Cola both have their highest average stock price in the same month. Which month is this?
tapply(GE$StockPrice, format(GE$Date,'%m'), mean) %>% sort
10 09 06 08 07 11 12 05 01 02 03 04
56.24 56.24 56.47 56.50 56.73 57.29 59.10 60.87 62.05 62.52 63.15 64.48
tapply(CO$StockPrice, format(CO$Date,'%m'), mean) %>% sort
09 10 08 07 11 12 01 02 06 05 03 04
57.60 57.94 58.88 58.98 59.10 59.73 60.37 60.73 60.81 61.44 62.07 62.69
§ 4.3 For the months of December and January, every company’s average stock is higher in one month and lower in the other. In which month are the stock prices lower?
sapply(L, function(df){
tapply(df$StockPrice, format(df$Date,'%m'), mean)
})
IBM GE PG CO BOE
01 150.2 62.05 79.62 60.37 46.51
02 152.7 62.52 79.03 60.73 46.89
03 152.4 63.15 77.35 62.07 46.88
04 152.1 64.48 77.69 62.69 47.05
05 151.5 60.87 77.86 61.44 48.14
06 139.1 56.47 77.39 60.81 47.39
07 139.1 56.73 76.65 58.98 46.55
08 140.1 56.50 76.82 58.88 46.86
09 139.1 56.24 76.62 57.60 46.30
10 137.3 56.24 76.68 57.94 45.22
11 138.0 57.29 78.46 59.10 45.15
12 140.8 59.10 78.30 59.73 46.17
📝 UNIT2B 學習重點:
The Magic of the apply()
Series:
■ tapply(x, factor, fun)
: apply function to x by factor
■ lapply(list, fun)
: apply function to each element of a list
■ sapply()
: apply function to each element of a list and simplify the ouput