Status:Closed    Asked:Apr 29, 2014 - 07:26 PM

How do I create a variable for the racial composition of the PUMA using IPUMS 2000 5% census data?

I would like to create a variable that indicates the racial composition of the respondents' local area. Ideally, it would be at the neighborhood level but as that isn't possible with the publically available census data, I know that I need to use the PUMA variable but I don't know how to do it. The variables that I'd like to create will indicate the percentage of the PUMA that is black, white, etc. so that I can control for racial composition of the respondent's area. I'd appreciate any direction. Thanks!

Do you have the same question? Follow this Question

Staff Answer




In terms of computer processes, there are a number of different ways you could go about attaching PUMA-level racial composition to individual respondents, depending upon which statistical software package you are using. However, what you will ultimately be doing is summarizing the RACE variable at the PUMA level. Whatever summary statistic you choose (e.g. percent white, percent black, or one variable for each racial group), you will essentially be looking at each PUMA individually, finding its weighted RACE summary, and then creating a new variable that contains that summary value for each person within the PUMA. You could do this by fist creating a separate PUMA level dataset that contains your race summaries and then merging those values onto the individual records based on PUMA. Or there may be functions within your statistical software that allow you to generate new variables summarized by a grouping variable (PUMA), such as egen in Stata.

I hope this helps.


May 05, 2014 - 11:42 AM

Report it


Thanks for your response, Joe! This is very helpful. I neglected to mention my software program but I am indeed using Stata so your suggestion about using egen is immensely helpful. Thanks again!


May 05, 2014 - 02:31 PM

Report it

egen mean () will not work as the data need to be weighted with the household or person weights. You would need to create two egen total(), by(puma), and then take the ratio:

egen puma_total_wgt = total( perwt ), by(puma)

egen puma_total_white = total( 1.race * perwt ), by(puma)

gen puma_frac_white = puma_total_white / puma_total_wgt


May 28, 2014 - 06:13 PM

Report it


when trying to declare the IPUMS data as panel on stata there are repeated ti...
2010 5-year PUMA definition
Where can I find city populations, by age, San Diego, New Orleans, Atlanta, C...
Hi, it has been over 4 hours for my data extract and I was wondering if there...
When I take several Nov Supplements, the weighted number of voters equals pub...
Are there aggregate categories for OCC/OCCLY in CPS?
Login   |   Register

Recently Active Members

View More »

Share |