How to use the crosswalk between 2000 and 2010 PUMAs

Hello,

I’m trying to link information from a 5-year ACS extract (2011-2015) to external data source for which I have 2000-era PUMA locations. So, basically I am looking for a method to assign 2000-era PUMA codes to ACS households in 2011-2015. I gather that for 2011, the PUMA variable will already be in 2000-era codes. However, for the remaining years, I need to crosswalk back. I see the crosswalk provided at IPUMS.org (PUMA2000_PUMA2010_crosswalk.xls), but I am not certain precisely how to use it.

For instance, if in the 2015 ACS I have a household located in 1800200 (as defined in GEOID10), they might be either in 1800300 or 1800400 in terms of GEOID00 (the 2000-era puma + state fips). I gather there should be some proportion so I can do something probabilistic, but I’m not sure what this would be. I see the crosswalk talks about an area of intersection, but I’m not certain what is meant by this precisely, or how to use it.

I welcome some advice on this.

Thanks

Tom

Each record in the crosswalk describes a spatial intersection (a shared area of overlap) between a 2000 PUMA and 2010 PUMA. If a 2000 PUMA is identical to a 2010 PUMA, there will be just one record describing their total area. If a 2000 PUMA was split into two 2010 PUMAs, there will be two records for the 2000 PUMA, one for each of its intersections with 2010 PUMAs. The “area of intersection” referred to in the file’s “data_dictionary” sheet is the shared area between each specified 2000 and 2010 PUMA.

Given your scenario, there are two strategies that would be appropriate, depending on your exact needs.

First, there is a “probabilistic” approach, as you suggest. For your settting, the most suitable probability to use is given in the “pPUMA10_Pop10” column. This identifies the “estimated percent of the 2010 PUMA’s 2010 population that lies in the area of intersection”, i.e., how much of the 2010 PUMA’s population resided in this 2000 PUMA? If 80% of a 2010 PUMA’s population resided in a particular 2000 PUMA, then you could assign each resident of the 2010 PUMA a 80% probability of having lived in that 2000 PUMA.

Alternatively, to reduce uncertainty and achieve greater accuracy, you could aggregate all of your data up to larger “ConsPUMAs”, which are “aggregations of one or more 2010 PUMAs that, in combination, align closely with a corresponding set of 2000 PUMAs”. See the ConsPUMA Definitions page for more details. The crosswalk file indicates which ConsPUMA each 2000 and 2010 PUMA is in, and there are also two files on the ConsPUMA Definitions page that identify the sets of 2010 PUMAs and 2000 PUMAs comprising each ConsPUMA. The advantage of ConsPUMAs would be that there would be no probabilistic assignment–you’d be able to identify (almost) exactly the correct set of residents for each ConsPUMA using either 2000 or 2010 PUMA codes. The disadvantage is that, in areas where there are many discrepancies between 2000 and 2010 PUMAs, ConsPUMAs can be very large. Through the Definitions page, you can find online maps that show the extents of ConsPUMAs relative to PUMAs. You could view those to determine if the areas would be satisfactory units for your analysis.

1 Like