Status:Closed    Asked:Jul 20, 2017 - 09:33 PM

What households are grouped in a single strata in IPUMS-International?

I am looking for clarification on the precise nature of "strata" in most of the IPUMS-International samples. From the page on comparability of this variable across samples, "strata" is described as:

"In most samples, the STRATA variable captures implicit geographic stratification and is created by assigning a unique identifier to groups of between 10 and 19 adjacent households within low level."

My confusion is about what this means in the context of IPUMS being, for instance, a 5% sample of all households. Consider a hypothetical sample that contains 5% of all households. The strata variable has 10 households per strata. Does this mean:

A. The data include 5% of households in each strata. In this case, each strata in reality has 200 households, but data users see only 5% (10) of them, presumably selected randomly.

B. The data include all households in each strata, but only 5% of all strata. In this case, there are 10 households per strata, but there are 20 times more strata in reality than we see in the data.

I am confused because the description calls households in the same strata "adjacent," which implies -- to me -- that a strata includes ten contiguous households, rather than 10 households selected out of a contiguous set of 200.

Does anyone have any insight into this question? It is probably not important for most uses, but it is critical for my purposes.

Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


All IPUMS International samples are stratified. This means the population is divided into strata based on geography or other key characteristics. Then, a sample is drawn from each stratum. So, this aligns most closely with case A in your question. More information about sampling design in IPUMS International can be found via this page.

I hope this helps. Let us know if you have any additional questions.


Jul 21, 2017 - 12:01 PM

Report it


Reason for missing MIGSTA1 values in ASEC files for 1985 (and 1995)
India Area-level data (geo2) inconsistent for total population, urban share, ...
Does anyone have a method for handling missing data in CPS "EDUC" variable? 2...
How do I find the data file:Ipumsi_00001.dat for the 1974
Variables from Full Year Consolidated Files currently not included in IPUMS-M...
Wrong HH weight Tanzania 2012 census?
Login   |   Register

Recently Active Members

View More »

Share |