Status:Closed    Asked:Mar 28, 2017 - 02:52 PM

HHWT for a person variable

We are looking to get a total of Households that have a student aged 15-19 years. Can I total the HHWT based upon those in that age range attending school? Or, can you only total the PERWT since it is a person variable? I assume households will have multiple students so I would expect those numbers to be lower than the PERWT total, and it is, but didn't know if you can mix the H's with the P's!

Thanks much!

Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


Essentially what you'll want to do here is create a household level variable that indicates if the household has a student attending school between the ages of 15-19. Then to produce an estimate of the population, you should apply the HHWT variable. You can certainly get fancy with the calculation of your standard errors around your estimates by using replicate weights, but it may not be necessary for your purposes. See this page for more information on the ACS sampling design. See this page for more information on replicate weights.


Mar 30, 2017 - 10:26 AM

Report it


gen byte _is_student_15to19 = inlist(school,2) & inrange(age,15,19)

egen _hh_has_student_15to19 = max( _is_student_15to19 ), by( serial )

egen hh_tag = tag( serial )

svyset [pw=hhwt], vce(sdr) sdrweight( rephhwt1 - rephhwt120 )

svy : total _hh_has_student_15to19 if hh_tag

Not sure I got the replicate weights right, but this should be fixable. The range of variables is a *very* dangerous piece of syntax. Note that sdrweight( rephhwt* ) syntax will unfortunately pick up the rephhwt flag indicating the presence of replicate weights for the given case, and that's not what you want. If you want bulletproof, or simply are paranoid like myself,

local repwtlist

forvalues k=1/120 {

local repwtlist `repwtlist' rephhwt`k'


svyset ... , sdrw( `repwtlist' )

IPUMS microdata format is in some ways unfortunate as it mixes household and person data files, and has two weight variables. I see how this can be confusing. While I nearly always say that svy commands should use subpop() when they need to restrict the sample, note that the use of -svy : command if selection- is appropriate here: it subsets the sample to one case per HH before sending it to svy, untangling the issue of the pooled HH and person level data.


Mar 28, 2017 - 03:18 PM

Report it

I appreciate the response, but I feel really stupid. I have no idea how to translate your recommendation! Help!


Mar 28, 2017 - 03:36 PM

Report it

Great, thank you! It does seem to check out as far as the numbers go for the HHWT variable, but for some reason I started to second-guess myself. Thank you very much!


Mar 30, 2017 - 10:28 AM

Report it


The highest one percent incwage earners are combined into the incwage"Top Cod...
To compute unemployment rates from Ipums-I, should I use perwt as pweights or...
Why is the income wage variable incwage-capped? and how is the cap determined...
USA IPUMS 1980 ancestry variable -- mapping to country names
Did the version of database change since 2011?
Are DATANUM, SERIAL, and PERNUM the same in single-year and 5-year ACS sample...
Login   |   Register

Share |