For US households, 1870-1930, (1) are boarders and lodgers were present? (2) what % of children 10-15 are employed

My unit of analysis is a household containing a husband and wife in 1870-1930. I have never done IPUMS analyses using households as the unit and need to learn how to do it. Within each household, I want to calculate whether or not any boarders and lodgers were present. “RELATE” has a category “other non-relatives” that looks good for my purposes. It’s defined as “including those persons paying or working for accommodations.”

I also want to calculate whether any children in the household between the ages of 10 and 15 were working outside the household. Forthis analyses I will further restrict the sample to households containing a husband and wife and at least one child 10-15. OCC1950, recoded so that 0-970 = working; 980-995=not working, 997-999=missing, looks good; but I am unsure of the best variable to use for children’s employment and how to code it.

IPUMS “rectangularized” data is organized so that household information is added to the end of each person’s row within that household. It is possible to download the data in a “Hierarchical” format, where households are on their own line and individuals are listed beneath the household they are a part of. For your purposes I would recommend using the default, rectangularized data, as this provides an ease of flexibility. For example, you can create a new variable that is coded as 1 if the individual is an “other non-relative”. You could then generate another new variable that counts the number of “other non-relatives” within a household and assigns that value to each member of the household, sorted by YEAR and SERIAL (and DATANUM if using multiple samples from a single year). Then, because all of the household information, including the new variable about number of other non-relatives, is stored on each household member, you can select just one person to represent the household in a household level analysis. The STATA code would look something like this:

. keep if gq==1
. gen roomer = 1 if relate==12
. sort year serial
. egen nroomer = total(roomer==1), by(year datanum serial)
. keep if pernum==1
The first line (keep if gq==1) drops group quarters from the data set, so that only households remain. After keeping only the first person from every household you can then apply the household weights (HHWT) to further analyses.
I think your choice of OCC1950 is appropriate for determining the working status of children, however there are a few codes within the range you defined as “Working” that may actually be within the household, such as “100: Farmers (owners and tenants)” and “830: Farm laborers, unpaid family workers”, who are probably working on the family farm. Also, though it doesn’t look like it will effect your analysis since you are looking at children ages 10-15, it is important to note the universe changes for OCC1950 over the period you are interested in. I hope this helps.