STRATA Variable in 2022 IPUMS USA

ACS 2022 5-year data does not have a strata variable. How would one use survey weights and declare a survey design for the 5-year 2022 ACS in Stata?

The lowest geographic unit identified in the ACS public use microdata sample (PUMS) file is the PUMA, an area containing 100,000 persons. IPUMS geographers infer other geographic units (e.g., cities, counties) where possible. The variable STRATA is created by the IPUMS team using PUMA. Beginning with the 2022 ACS, PUMA boundaries were based on the 2020 decennial census. The 2022 5-year ACS sample includes data that use both the 2010 PUMA definitions (2018-2021) and data that use the 2020 PUMA definitions (2022). Our initial release of the 2022 5-Year ACS PUMS does not include geographic identifiers for areas smaller than states (including STRATA) as they require special handling of these different PUMA definitions. We plan to release more detailed geographic units throughout the spring, and aim to provide the most popular variables sometime this week. Check the revision history page (or your email) for an announcement of the new variables.

Fortunately, you don’t need STRATA to set your weights or account for sample design in analyses of ACS microdata. The Census Bureau recommends using replicate weights to obtain empirically derived standard errors of these data, and the replicate weights are available for the 2018-2022 5-year ACS PUMS files via IPUMS.

Thank you Isabel Pastoor!
I have another question: I’m having a hard time using replicate weights, it’s taking an extremely long time to run svy: tab multyear after svyset [pweight=perwt], vce(brr) brrweight(repwtp1-repwtp80) fay(.5) mse
Is this a common issue do you know?

Thanks again!

In my personal experience using Stata, running commands using the svy option tend to take awhile. I don’t see anything wrong with your code. If you are working with a very large dataset, on a computer with limited processing power, or a less powerful or older version of Stata, you may expect these commands to take awhile.

Here are two suggestions that may help:

  • Run your analysis on single years of ACS data, one at a time
  • Run your code without replicate weights first to determine if the code is efficient/fast and error-free before adding the replicate weights

If running your analysis is still untenably slow, you may want to perform a check with STRATA and compare the standard errors you get using STRATA versus using replicate weights. The standard error estimates are sometimes, but not always, significantly different using these two methods, depending on the particulars of the analysis.