Status:Closed    Asked:Jul 03, 2017 - 06:12 PM

Is it possible to have more observations that fit a certain criteria in a revised extract than in the original?

Last year, I used an extract of data that had the occupation "occ2010" variable in it. The analysis I am conducting only keeps the observations of people who work in the construction sector. Today, I revised the old extract and included two new variables "EMPSTAT" and "LABFORCE." However, when I condense my data set by keeping only the construction occupations, even when I use the exact same commands as before, the newly condensed data set with the revised data has more observations in it than the original. I am trying to replicate my previous research while excluding people who were not in the labor force. However, I cannot replicate it exactly due to seemingly different observations in the revised data set. Is there a way that I can match the two data sets identically?

Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


Occasionally, IPUMS updates variables to improve upon known errors. This could cause differences when data extracts are revised and resubmitted. However, this doesn't seem to be what is necessarily happening in your case. I just looked at your extract 30 and the revised extract 41, and found no difference in the number of observations of those who work in the construction sector. Perhaps you can send the code you are using to limit your sample to only construction workers? Another alternative, if you want to perfectly replicate your original analysis, is to merge EMPSTAT and LABFORCE onto your original data set using YEAR, DATANUM, SERIAL, and PERNUM as identifiers.

I hope this helps.


Jul 05, 2017 - 09:48 AM

Report it


Thank you for your response. After double checking the data, I come to the same conclusion as you. I must have done something to my data set at some point in my previous analysis and did not record what I had done.


Jul 18, 2017 - 03:33 PM

Report it


The highest one percent incwage earners are combined into the incwage"Top Cod...
To compute unemployment rates from Ipums-I, should I use perwt as pweights or...
Why is the income wage variable incwage-capped? and how is the cap determined...
USA IPUMS 1980 ancestry variable -- mapping to country names
Did the version of database change since 2011?
Are DATANUM, SERIAL, and PERNUM the same in single-year and 5-year ACS sample...
Login   |   Register

Share |