Skip to contents

censusapi is a lightweight package that retrieves data from the U.S. Census Bureau’s APIs. More than 1,000 Census API endpoints are available, including the Decennial Census, American Community Survey, Poverty Statistics, Population Estimates, and Census microdata. This package is designed to let you get data from all of those APIs using the same main functions and syntax for every dataset.

This package returns the data as-is with the original variable names created by the Census Bureau and any quirks inherent in the data. Each dataset is a little different. Some are documented thoroughly, others have documentation that is sparse. Sometimes variable names change each year. This package can’t overcome those challenges, but tries to make it easier to get the data for use in your analysis. Make sure to thoroughly read the documentation for your dataset and see below for how to get help with Census data.

API key setup

To use the Census APIs, sign up for an API key, which will be sent to your provided email address. You’ll need that key to use this package. censusapi will use it by default without any extra work on your part.

To save your API key, within R, run:

# Add key to .Renviron
Sys.setenv(CENSUS_KEY=PASTEYOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")

Once you’ve added your census key to your system environment, censusapi will use it by default without any extra work on your part.

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to manually set key = "YOURKEY" as an argument in getCensus() if you prefer.

Finding your API

To get started, load the censusapi library.

To see a current table of every available endpoint, uselistCensusApis(). This data frame includes useful information for making your API call, including the dataset’s name, description and title, as well as a contact email for questions about the underlying data.

apis <- listCensusApis()
colnames(apis)
#> [1] "title"       "name"        "vintage"     "type"        "temporal"   
#> [6] "url"         "modified"    "description" "contact"

This returns useful information about each endpoint.

  • title: Short written description of the dataset
  • name: Programmatic name of the dataset, to be used with censusapi functions
  • vintage: Year of the survey, for use with microdata and aggregate datasets
  • type: Dataset type, which is either Aggregate, Microdata, or Timeseries
  • temporal: Time period of the dataset - only documented sometimes
  • url: Base URL of the endpoint
  • modified: Date last modified
  • description: Long written description of the dataset
  • contact: Email address for specific questions about the Census Bureau survey

Dataset types

There are three types of datasets included in the Census Bureau API universe: aggregate, microdata, and timeseries. These type names were defined by the Census Bureau and are included as a column in listCensusApis().

table(apis$type)
#> 
#>  Aggregate  Microdata Timeseries 
#>        556        637         57

Most users will work with summary data, either aggregate or timeseries. Summary data contains pre-calculated numbers or percentages for a given statistic — like the number of children in a state or the median household income. The examples below and in the broader list of censusapi examples use summary data.

Aggregate datasets, like the American Community Survey or Decennial Census, include data for only one time period (a vintage), usually one year. Datasets like the American Community Survey contain thousands of these pre-computed variables.

Timeseries datasets, including the Small Area Income and Poverty Estimates, the Quarterly Workforce Estimates, and International Trade statistics, allow users to query data for more than one time period in a single API call.

Microdata contains the individual-level responses for a survey for use in custom analysis. One row represents one person. Only advanced analysts will want to use microdata. Learn more about what microdata is and how to use it with censusapi in Accessing microdata.

Using getCensus

The main function in censusapi is getCensus(), which makes an API call to a given endpoint and returns a data frame with results. Each API has slightly different parameters, but there are always a few required arguments:

  • name: the programmatic name of the endpoint as defined by the Census, like “acs/acs5” or “timeseries/bds/firms”
  • vintage: the survey year, required for aggregate or microdata APIs
  • vars: a list of variables to retrieve
  • region: the geography level to retrieve, such as state or county, required for most endpoints

Some APIs have additional required or optional arguments, like time or monthly for some timeseries datasets. Check the specific documentation for your API and explore its metadata with listCensusMetadata() to see what options are allowed.

Let’s walk through an example getting uninsured rates using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates for people below age 65.

Choosing variables

censusapi includes a metadata function called listCensusMetadata() to get information about an API’s variable and geography options. Let’s see what variables are available in the SAHIE API:

sahie_vars <- listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "variables")

# See the full list of variables
sahie_vars$name
#>  [1] "for"        "in"         "time"       "NIPR_LB90"  "NIPR_PT"   
#>  [6] "AGECAT"     "NIC_PT"     "GEOID"      "STATE"      "RACE_DESC" 
#> [11] "YEAR"       "IPRCAT"     "PCTIC_UB90" "NIPR_MOE"   "PCTUI_LB90"
#> [16] "NIC_MOE"    "US"         "COUNTY"     "NUI_UB90"   "PCTUI_MOE" 
#> [21] "NIC_UB90"   "NUI_MOE"    "SEXCAT"     "PCTUI_PT"   "PCTIC_LB90"
#> [26] "PCTUI_UB90" "NUI_PT"     "STABREV"    "AGE_DESC"   "NAME"      
#> [31] "NIC_LB90"   "PCTIC_PT"   "PCTIC_MOE"  "IPR_DESC"   "NIPR_UB90" 
#> [36] "NUI_LB90"   "GEOCAT"     "SEX_DESC"   "RACECAT"
# Full info on the first several variables
head(sahie_vars)
name label concept predicateType group limit predicateOnly required
for Census API FIPS ‘for’ clause Census API Geography Specification fips-for N/A 0 TRUE NA
in Census API FIPS ‘in’ clause Census API Geography Specification fips-in N/A 0 TRUE NA
time ISO-8601 Date/Time value Census API Date/Time Specification datetime N/A 0 TRUE true
NIPR_LB90 Number in Demographic Group for Selected Income Range, Lower Bound for 90% Confidence Interval Uncertainty Measure int N/A 0 NA NA
NIPR_PT Number in Demographic Group for Selected Income Range, Estimate Estimate int N/A 0 NA NA
AGECAT Age Category Demographic ID int N/A 6 NA default displayed

Choosing regions

We can also use listCensusMetadata to see which geographic levels are available.

listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "geography")
name geoLevelId limit referenceDate requires wildcard optionalWithWCFor
us 010 1 2015-01-01 NULL NULL NA
county 050 3142 2015-01-01 state state state
state 040 52 2015-01-01 NULL NULL NA

This API has three geographic levels: us, county, and state. County data can be queried for all counties nationally or within a specific state.

Making a censusapi call

First, using getCensus(), let’s get the percent (PCTUI_PT) and number (NUI_PT) of people uninsured, using the wildcard star (*) to retrieve data for all counties.

sahie_counties <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:*", 
    time = 2019)
head(sahie_counties)
time state county NAME PCTUI_PT NUI_PT
2019 01 001 Autauga County, AL 9.4 4366
2019 01 003 Baldwin County, AL 10.9 19085
2019 01 005 Barbour County, AL 13.0 2194
2019 01 007 Bibb County, AL 11.0 1824
2019 01 009 Blount County, AL 14.3 6663
2019 01 011 Bullock County, AL 11.1 752

We can also get data on detailed income and demographic groups from the SAHIE. We’ll use region to specify county-level results and regionin to filter to Virginia, state code 51. We’ll get uninsured rates by income group, IPRCAT.

sahie_virginia <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "county:*", 
    regionin = "state:51", 
    time = 2019)
head(sahie_virginia)
time state county NAME IPRCAT IPR_DESC PCTUI_PT
2019 51 001 Accomack County, VA 0 All Incomes 15.1
2019 51 001 Accomack County, VA 1 <= 200% of Poverty 19.6
2019 51 001 Accomack County, VA 2 <= 250% of Poverty 19.4
2019 51 001 Accomack County, VA 3 <= 138% of Poverty 19.7
2019 51 001 Accomack County, VA 4 <= 400% of Poverty 17.5
2019 51 001 Accomack County, VA 5 138% to 400% of Poverty 16.3

Because the SAHIE API is a timeseries dataset, as indicated in its name,, we can get multiple years of data at once by changing time = X to time = "from X to Y". Let’s get that data for DeKalb County, Georgia using county fips code 089 and state fips code 13. You can look up fips codes on the Census Bureau website.

sahie_years <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT"), 
    region = "county:089", 
    regionin = "state:13",
    time = "from 2006 to 2019")
sahie_years
time state county NAME PCTUI_PT
2006 13 089 DeKalb County, GA 19.0
2007 13 089 DeKalb County, GA 17.2
2008 13 089 DeKalb County, GA 22.5
2009 13 089 DeKalb County, GA 22.9
2010 13 089 DeKalb County, GA 25.8
2011 13 089 DeKalb County, GA 23.9
2012 13 089 DeKalb County, GA 21.7
2013 13 089 DeKalb County, GA 22.1
2014 13 089 DeKalb County, GA 19.4
2015 13 089 DeKalb County, GA 16.9
2016 13 089 DeKalb County, GA 15.3
2017 13 089 DeKalb County, GA 15.9
2018 13 089 DeKalb County, GA 17.1
2019 13 089 DeKalb County, GA 16.9

We can also filter the data by income group using the IPRCAT variable again. IPRCAT = 3 represents <=138% of the federal poverty line. That is the threshold for Medicaid eligibility in states that have expanded it under the Affordable Care Act.

Getting this data for Los Angeles county (fips code 06037) we can see the dramatic decrease in the uninsured rate in this income group after California expanded Medicaid.

sahie_138 <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:037", 
    regionin = "state:06", 
    IPRCAT = 3,
    time = "from 2010 to 2019")
sahie_138
time state county NAME PCTUI_PT NUI_PT IPRCAT
2010 06 037 Los Angeles County, CA 37.4 894385 3
2011 06 037 Los Angeles County, CA 35.1 867577 3
2012 06 037 Los Angeles County, CA 34.4 865516 3
2013 06 037 Los Angeles County, CA 33.0 818978 3
2014 06 037 Los Angeles County, CA 24.9 607542 3
2015 06 037 Los Angeles County, CA 17.8 402977 3
2016 06 037 Los Angeles County, CA 15.4 329251 3
2017 06 037 Los Angeles County, CA 14.3 281842 3
2018 06 037 Los Angeles County, CA 13.9 255520 3
2019 06 037 Los Angeles County, CA 15.1 254740 3

We can also get data for other useful demographics such as age group.

sahie_age <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT", "AGECAT", "AGE_DESC"), 
    region = "county:037", 
    regionin = "state:06",
    time = 2019)
sahie_age
time state county NAME PCTUI_PT NUI_PT AGECAT AGE_DESC
2019 06 037 Los Angeles County, CA 11.1 940376 0 Under 65 years
2019 06 037 Los Angeles County, CA 13.6 864634 1 18 to 64 years
2019 06 037 Los Angeles County, CA 12.8 406708 2 40 to 64 years
2019 06 037 Los Angeles County, CA 11.3 208558 3 50 to 64 years
2019 06 037 Los Angeles County, CA 3.9 85306 4 Under 19 years
2019 06 037 Los Angeles County, CA 13.7 822705 5 21 to 64 years

Annotations

Some Census datasets, including the American Community Survey, use annotated values. These values use numbers or symbols to indicate that the data is unavailable, has been top coded, has an insufficient sample size, or other noteworthy characteristics. Read more from the Census Bureau on ACS annotation meanings and ACS variable types.

The censusapi package is intended to return the data as-is so that you can receive those unaltered annotations. If you are using data for a small geography like Census tract or block group make sure to check for values like -666666666 or check the annotation columns for non-empty values to exclude as needed.

As an example, we’ll get median income with associated annotations and margin of error for three census tracts in Washington, DC. The value for one tract is available, one is top coded, and one is unavailable. Notice that income is top coded at $250,000 — meaning any tract’s income that is above that threshold is listed as $250,001. You can see a value has a special meaning in the “EA” (estimate annotation) and “MA” (margin of error annotation) columns.

acs_income <- getCensus(
    name = "acs/acs5",
    vintage = 2020, 
    vars = c("B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), 
    region = "tract:006804,007703,000903",
    regionin = "county:001&state:11")
acs_income
state county tract B19013_001E B19013_001EA B19013_001M B19013_001MA
11 001 007703 46156 NA 24087 NA
11 001 000903 250001 250,000+ -333333333 ***
11 001 006804 -666666666 - -222222222 **

Variable groups

For some surveys, particularly the American Community Survey and Decennial Census, you can get many related variables at once using a variable group. These groups are defined by the Census Bureau. In some other data tools, like data.census.gov, this concept is referred to as a table.

Some groups have several dozen variables, others just have a few. As an example, we’ll get the estimate, margin of error and annotations for median household income in the past 12 months for Census tracts in Alaska using group B19013.

First, see descriptions of the variables in group B19013:

group_B19013 <- listCensusMetadata(
    name = "acs/acs5",
    vintage = 2017,
    type = "variables",
    group = "B19013")
group_B19013
name label concept predicateType group limit predicateOnly
B19013_001MA Annotation of Margin of Error!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) string B19013 0 TRUE
B19013_001EA Annotation of Estimate!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) string B19013 0 TRUE
B19013_001E Estimate!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) int B19013 0 TRUE
B19013_001M Margin of Error!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) int B19013 0 TRUE

Now, retrieve the data using vars = "group(B19013)". You could alternatively manually list each variable as vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), but using the groups is much easier.

acs_income_group <- getCensus(
    name = "acs/acs5", 
    vintage = 2017, 
    vars = "group(B19013)", 
    region = "tract:*", 
    regionin = "state:02")
head(acs_income_group)
state county tract B19013_001E B19013_001EA B19013_001M B19013_001MA GEO_ID NAME
02 068 000100 83295 NA 6362 NA 1400000US02068000100 Census Tract 1, Denali Borough, Alaska
02 261 000200 95227 NA 22638 NA 1400000US02261000200 Census Tract 2, Valdez-Cordova Census Area, Alaska
02 261 000300 89000 NA 20435 NA 1400000US02261000300 Census Tract 3, Valdez-Cordova Census Area, Alaska
02 261 000100 49076 NA 7165 NA 1400000US02261000100 Census Tract 1, Valdez-Cordova Census Area, Alaska
02 122 000200 57694 NA 6526 NA 1400000US02122000200 Census Tract 2, Kenai Peninsula Borough, Alaska
02 122 000800 50904 NA 3723 NA 1400000US02122000800 Census Tract 8, Kenai Peninsula Borough, Alaska

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata(type = "geographies") to see the available options.

Tract-level data from the 2010 Decennial Census can only be requested from one state at a time. In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(
        name = "dec/sf1",
        vintage = 2010,
        vars = "P001001",
        region = "tract:*",
        regionin = stateget)
    tracts <- rbind(tracts, temp)
}
# How many tracts are present?
nrow(tracts)
#> [1] 73057
head(tracts)
state county tract P001001
01 001 020100 1912
01 001 020500 10766
01 001 020300 3373
01 001 020400 4386
01 001 020200 2170
01 001 020600 3668

The regionin argument of getCensus() can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(
    name = "dec/sf1",
    vintage = 2010,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027+tract:010000")
head(data2010)
state county tract block P001001
36 027 010000 1000 31
36 027 010000 1011 17
36 027 010000 1028 41
36 027 010000 1001 0
36 027 010000 1031 0
36 027 010000 1002 4

For many more examples and advanced topics check out all of the articles.

Troubleshooting

The APIs contain more than 1,000 endpoints, each of which work a little differently. The Census Bureau also makes frequent changes to the APIs, which unfortunately are not usually announced in advance. If you’re getting an error message or unexpected results, here are some things to check.

Variables

Use listCensusMetadata(type = "variables") on your endpoint to see what variables are available. Occasionally the names will change from year to year. This is very common with the ACS and Decennial surveys as a well as the Population Estimates Program.

The Census APIs are case-sensitive, which means that if the variable name you want is uppercase you’ll need to write it uppercase in your request. Most of the APIs use uppercase, but some use lowercase and some even use sentence case variable names.

Geographies

Use listCensusMetadata(type = "geographies") on your dataset to check which geographies you can use. Each API has its own list of valid geographies and they occasionally change as the Census Bureau makes updates.

If you’re specifying a region by FIPS code, for example state:01, make sure to use the full code, padded with 0s if necessary. Previously, specifying state:1 usually worked, but the APIs now enforce using the full character FIPS codes. See the Census Bureau FIPS reference for valid codes.

General

Read the online documentation for your dataset. Unfortunately, some information is not included in the developer metadata or documentation pages and is only available in PDFs. These PDFs are linked on the Census Bureau’s website. Please check for PDF documentation.

Unexpected errors

Occasionally you might get the general error message "There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience." This comes from the Census Bureau and could be caused by any number of problems, including server issues. Try rerunning your API call. If that doesn’t work and you are requesting a large amount of data, try reducing the amount that you’re requesting. If you’re still having trouble, see below for ways to get help.

Other ways to get help

  • If your getCensus() call results in an error, it will print the underlying API call in your R console. You can open this URL in your web browser to view it directly. You can always view the underlying call by using getCensus(show_call = TRUE).
  • Open a Github issue for bugs or issues caused by this R package.
  • Join the public Census Bureau Slack channel and ask your question in the R or API rooms.
  • Email the Census Bureau API team at for questions relating to the underlying data and APIs. Make sure to include the underlying API call if you’re having trouble with a specific API request, not the R code. You can see this API call in the censusapi error message. You can also reach out to the contact listed in the dataset metadata found in listCensusApis() for questions about a specific survey.