Merging issues with the Sep 2000, Apr 2001, Jun 2001 and Feb 2002 basic monthly CPS data (IPUMS-CPS with NBER)


I'm merging IPUMS-CPS basic monthly files (2000-2016) with NBER's to add variables that the first lack. Unfortunately, even though, I've been able to sucessfully merge most of them using the instructions outined here, i.e. have found no age or sex difference, I haven't achieved the same with the September 2000, April 2001, June 2001 and February 2002 surveys.

For Sep 2000 and Feb 2002, I'm able to find matches (based on hrhhid, huhhnum, hrsample, hrsersuf, statefip and lineno) for all observations in the IPUMS-CPS files, which equal the number of obs. in the NBER's dimissing non-respondents (lineno==-1). Nevertheless, I don't understand why in both surveys many matched observations differ on age: 6,429 out of 121,658 and 3,150 out of 140,775, respectively.

Finally, for Apr 2001 and Jun 2001, besides getting 22,773 and 22,770 unmatched observations in the IPUMS-CPS files (my master data), I also find matched observations that differ on age or sex. In the first survey, I find 617 matched observations with different age and 18 with different sex; while in the second one, 715 and 30, respectively.

Why do I find these age/sex differences for matched observations in all four surveys? And why do I get so many unmatched observations in the IPUMS-CPS files for Apr and Jun 2001?

I've realized that in all four surveys almost all of the troubling observations have hrsersuf=="0" (6,406 for Sep 2000 and 3,140 for Feb 2002; 22,701 for Apr 2001 and 22,699 for Jun 2001), but I don't know if/how this fact could account for the merge results I get.

Just in case, before merging the surveys prior to May 2004, I used the command: replace hrsersuf="-1" if hrsersuf=="0" in the IPUMS-CPS files to make this variable coding consistent with NBER's.

I would really appreciate it if you could please help solve the merging issues I'm facing with the four basic monthly surveys.

Thank you so much in advance,


I believe the root of the discrepancies you are seeing for these samples are all related to the fact that the Census Bureau releases (and NBER re-releases) the basic and supplement files as separate files, while IPUMS-CPS (with the exception of the ASEC supplement, due to the over-sample) exclusively integrates and releases the supplement files if available as the cases are meant to be identical between the two files. While it does seem to be the case that all records present in the 2000 September and 2002 February basic files are in fact present in their corresponding supplement files, as you noticed there are differences in age. You identified these differences between the NBER basic and the IPUMS-CPS files (based on the supplement files), but you can also identify these differences comparing the basic and supplement files that are available through NBER or the CPS FTP site so IPUMS-CPS is not introducing any edits to change these values.

Similarly, The April and June 2001 supplement files include an expanded sample that is not present in the basic file. There was a sample expansion happening in all of 2001. Only starting in July, however, were the expansion cases delivered to users. That appears to be true for basic files, but not for supplement files. Unfortunately, I am not able to find any documentation from the Census Bureau to confirm that this is the case, however. If it is the case, the introduction of new individuals for the expanded sample may have disrupted the identification system, leading to mismatched records between the basic and supplement files.

IPUMS-CPS is looking further into these discrepancies between the basic and supplement files, but you are also welcome to contact the Census Bureau directly to inquire further about the differing age values between basic and supplement files.

I hope this helps. I'm sorry I could not provide a more complete answer.


Dec 29, 2016 - 09:37 AM

