就業統計數據是政策制定者用來衡量經濟整體實力的最重要指標之一。在美國,政府使用現有人口調查(CPS)衡量失業率,該調查每月收集來自各種美國人的人口統計和就業信息。在本練習中,我們將使用講座中審查的主題以及一些使用2013年9月版的,具有全國代表性的數據集。數據集中的觀察結果代表2013年9月CPS中實際完成調查的人員,完整數據集有385個欄位,但在本練習中,我們將使用數據集CPSData.csv版本,它具有以下欄位:
PeopleInHousehold
: 受訪者家庭中的人數。Region
: 受訪者居住的人口普查區域。State
: 受訪者居住的州。MetroAreaCode
: 都會區代碼,如受訪者不住都會區,則為NA;從代碼到都會區名稱的對應在MetroAreaCodes.csv
中提供。Age
: 受訪者的年齡,以年為單位。 80代表80-84歲的人,85代表85歲及以上的人。Married
: 受訪者的婚姻狀況。Sex
: 受訪者的性別。Education
: 受訪者獲得的最高教育程度。Race
: 受訪者的種族。Hispanic
: 受訪者是否屬於西班牙裔。CountryOfBirthcode
: 識別受訪者出生國家的代碼。從代碼到國家名稱的映射在CountryCodes.csv文件中提供。Citizenship
: 受訪者的公民身份。EmploymentStatus
: 受訪者的就業狀況。Industry
: 受訪者的就業行業(僅在受僱的情況下可用)。§ 1.1 How many interviewees are in the dataset?
#
#
§ 1.2 Among the interviewees with a value reported for the Industry variable, what is the most common industry of employment? Please enter the name exactly how you see it.
#
#
§ 1.3 Which state has the fewest interviewees?
#
#
Which state has the largest number of interviewees?
#
#
§ 1.4 What proportion of interviewees are citizens of the United States?
#
#
§ 1.5 For which races are there at least 250 interviewees in the CPS dataset of Hispanic ethnicity? (Select all that apply.)
#
#
§ 2.1 Which variables have at least one interviewee with a missing (NA) value? (Select all that apply.)
#
#
§ 2.2 Which is the most accurate:
#
#
§ 2.3 How many states had all interviewees living in a non-metropolitan area (aka they have a missing MetroAreaCode value)? For this question, treat the District of Columbia as a state (even though it is not technically a state).
#
#
How many states had all interviewees living in a metropolitan area? Again, treat the District of Columbia as a state.
#
#
§ 2.4 Which region of the United States has the largest proportion of interviewees living in a non-metropolitan area?
#
#
§ 2.5 Which state has a proportion of interviewees living in a non-metropolitan area closest to 30%?
#
#
Which state has the largest proportion of non-metropolitan interviewees, ignoring states where all interviewees were non-metropolitan?
#
#
§ 3.1 How many observations (codes for metropolitan areas) are there in MetroAreaMap?
#
#
How many observations (codes for countries) are there in CountryMap?
#
#
§ 3.2 What is the name of the variable that was added to the data frame by the merge() operation?
#
#
How many interviewees have a missing value for the new metropolitan area variable?
#
#
§ 3.3 Which of the following metropolitan areas has the largest number of interviewees?
#
#
§ 3.4 Which metropolitan area has the highest proportion of interviewees of Hispanic ethnicity?
#
#
§ 3.5 Determine the number of metropolitan areas in the United States from which at least 20% of interviewees are Asian.
#
#
§ 3.6 Passing na.rm=TRUE to the tapply function, determine which metropolitan area has the smallest proportion of interviewees who have received no high school diploma.
#
#
§ 4.1 What is the name of the variable added to the CPS data frame by this merge operation?
#
#
How many interviewees have a missing value for the new metropolitan area variable?
#
#
§ 4.2 Among all interviewees born outside of North America, which country was the most common place of birth?
#
#
§ 4.3 What proportion of the interviewees from the “New York-Northern New Jersey-Long Island, NY-NJ-PA” metropolitan area have a country of birth that is not the United States?
#
#
§ 4.4 Which metropolitan area has the largest number (note – not proportion) of interviewees with a country of birth in India?
#
#
In Brazil?
#
#
In Somalia?
#
#