Skip to contents

NOTE: This article is under construction.

This package provides basic support for the Census’s new microdata APIs, using the same getCensus() functions used for summary data. Getting the data with getCensus() is easy. Using it responsibly takes some homework.

About microdata

Microdata contains individual-level responses: one row per person. It is a vital tool to perform custom analysis, but with great power comes great responsibility. Appropriately weighting the individual-level responses is required. You’ll often need to work with household relationships and will need to handle responses that aren’t in the universe of the question (for example, removing children in an analysis about college graduation rate.)

If you’re new to working with microdata you’ll need to do some reading before diving in. Here are some resources from the Census Bureau:

As for all other endpoints, censusapi retrieves the data so that you can perform your own analysis using your methodology of choice. If you’re looking for an interactive microdata analysis tool, try the data.census.gov microdata interactive tool or the IPUMS online data analysis tool.

Once you’ve learned how to use microdata and gained and understanding of weighting, getting the data using censusapi is simple.

Getting microdata with censusapi

As an example, we’ll get data from the 2020 Current Population Survey Voting Supplement. This survey asks people if they voted, how, and when, and includes useful demographic data.

See the available variables:

voting_vars <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "variables")
head(voting_vars)
name label concept predicateType group limit predicateOnly suggested_weight is_weight
for Census API FIPS ‘for’ clause Census API Geography Specification fips-for N/A 0 TRUE NA NA
in Census API FIPS ‘in’ clause Census API Geography Specification fips-in N/A 0 TRUE NA NA
ucgid Uniform Census Geography Identifier clause Census API Geography Specification ucgid N/A 0 TRUE NA NA
PEEDUCA Demographics-highest level of school completed NA int N/A 0 NA PWSSWGT NA
PUBUS1 Labor Force-unpaid work in family business/farm,y/n NA int N/A 0 NA PWCMPWGT NA
PRCOW1 Indus.&Occ.-(main job)class of worker-recode NA int N/A 0 NA PWCMPWGT NA

From the CPS Voting supplement, get data on method of voting in New York state using PES5 (Vote in person or by mail?) and PESEX (gender), along with the appropriate weighting variable, PWSSWGT. We’ll only get data for people with a response of 1 (yes) to PES1 (Did you vote?).

cps_voting <- getCensus(
    name = "cps/voting/nov",
    vintage = 2020,
    vars = c("PES5", "PESEX", "PWSSWGT"),
    region = "state:36",
    PES1 = 1)
head(cps_voting)
state PES5 PESEX PWSSWGT PES1
36 1 1 4571.216 1
36 1 2 4806.369 1
36 1 2 3440.301 1
36 -3 1 5204.566 1
36 -3 2 4993.819 1
36 1 2 4602.958 1

Making a data dictionary

Most of microdata variables are encoded, which means that your data will have a lot of numbers instead of text labels.

A data dictionary, which includes the definitions and labels for every variable in the dataset, is helpful. This is possible with listCensusMetasdata(include_values = "TRUE) returns a data dictionary with one row for each variable-label pair. That means if there are 30 codes for a given variable, it will have 30 rows in the data dictionary. Variables that don’t have value labels in the metadata will have only one row.

voting_dict <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "variables",
    include_values = TRUE)
head(voting_dict)
name label concept predicateType group limit predicateOnly suggested_weight is_weight values_code values_label
for Census API FIPS ‘for’ clause Census API Geography Specification fips-for N/A 0 TRUE NA NA NA NA
in Census API FIPS ‘in’ clause Census API Geography Specification fips-in N/A 0 TRUE NA NA NA NA
ucgid Uniform Census Geography Identifier clause Census API Geography Specification ucgid N/A 0 TRUE NA NA NA NA
PEEDUCA Demographics-highest level of school completed NA int N/A 0 NA PWSSWGT NA 46 DOCTORATE DEGREE(EX:PhD,EdD)
PEEDUCA Demographics-highest level of school completed NA int N/A 0 NA PWSSWGT NA 33 5th Or 6th Grade
PEEDUCA Demographics-highest level of school completed NA int N/A 0 NA PWSSWGT NA 44 MASTER’S DEGREE(EX:MA,MS,MEng,MEd,MSW)

You can also look up the meaning of those codes for a single variable using the same function, listCensusMetadata(). Here are the values of PES5, the variable for “Vote in person or by mail?”

PES5_values <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "values",
    variable = "PES5")
PES5_values
code label
2 By Mail
-2 Don’t Know
1 In person
-1 Not in Universe
-9 No Response
-3 Refused

Other ways to access microdata

The Census Bureau microdata APIs are helpful for working with a limited just-released datasets. But they’re not you’re only option. Some other ways to get microdata are:

  • Retrieve standardized, cleaned microdata data from IPUMS and import with the impumsr package. IPUMS is widely used in research when the data needed is not brand new. I highly recommend that you check out IPUMS’ cleaned files microdata files as well as historic geographic data. These standardized files are generally released months to a year after the raw Census microdata that is available directly from the Census Bureau.
  • Download complete bulk files from the Census FTPs (file transfer protocols.) This is helpful if you need the a large number of variables. You might run in to size limitations getting many variables through the APIs.
  • Retrieve American Community Survey microdata via the Census APIs with tidycensus, which has helpful functions for working with those endpoints.