Change in sample size in 94 and 95 when linking individuals with non-missing weekly earnings (earnweek) using cpsi

Hello,

We ran into a problem using the longitudinal links in CPS. I am wondering whether anyone knows the source of this discrepancy.

  1. We download the basic monthly CPS dataset from 1989 to 2014 and kept those observations with non-missing weekly earnings (earnweek). The question on weekly earnings is only for the outgoing rotation groups, so people are interviewed with this question during their 4th and 8th round of survey (exactly one year in between these two rounds). Below is the sample tabulated by survey year.

survey year | Freq. Percent Cum.
------------±----------------------------------
1989 | 177,346 3.98 3.98
1990 | 185,905 4.17 8.15
1991 | 180,294 4.04 12.19
1992 | 177,543 3.98 16.17
1993 | 169,389 3.80 19.97
1994 | 171,445 3.85 23.82
1995 | 170,990 3.83 27.65
1996 | 152,840 3.43 31.08
1997 | 155,585 3.49 34.57
1998 | 157,608 3.53 38.10
1999 | 160,058 3.59 41.69
2000 | 161,792 3.63 45.32
2001 | 177,457 3.98 49.30
2002 | 184,802 4.14 53.45
2003 | 181,419 4.07 57.51
2004 | 178,390 4.00 61.52
2005 | 179,690 4.03 65.55
2006 | 179,170 4.02 69.56
2007 | 177,474 3.98 73.54
2008 | 175,286 3.93 77.47
2009 | 170,046 3.81 81.29
2010 | 168,102 3.77 85.06
2011 | 166,479 3.73 88.79
2012 | 166,328 3.73 92.52
2013 | 166,083 3.72 96.25
2014 | 167,328 3.75 100.00
------------±----------------------------------
Total | 4,458,849 100.00

  1. Nothing strange. Then we further restricted the sample by keeping those people who reported nonmissing weekly earnings in two consecutive years. In other words, we dropped people who have only one obs with non-missing weekly earnings. Tabulating by year, the sample looks like follows

survey year | Freq. Percent Cum.
------------±----------------------------------
1989 | 58,818 4.29 4.29
1990 | 60,143 4.38 8.67
1991 | 59,006 4.30 12.97
1992 | 56,364 4.11 17.08
1993 | 54,892 4.00 21.08
1994 | 21,763 1.59 22.67
1995 | 17,505 1.28 23.94
1996 | 52,995 3.86 27.81
1997 | 53,203 3.88 31.68
1998 | 54,669 3.98 35.67
1999 | 55,350 4.03 39.70
2000 | 54,431 3.97 43.67
2001 | 60,135 4.38 48.05
2002 | 63,385 4.62 52.67
2003 | 61,029 4.45 57.12
2004 | 56,092 4.09 61.21
2005 | 61,090 4.45 65.66
2006 | 60,737 4.43 70.09
2007 | 61,533 4.49 74.58
2008 | 59,548 4.34 78.92
2009 | 58,994 4.30 83.22
2010 | 58,557 4.27 87.49
2011 | 57,789 4.21 91.70
2012 | 57,789 4.21 95.91
2013 | 56,115 4.09 100.00
------------±----------------------------------
Total | 1,371,932 100.00

  1. There is a very sharp drop in sample size around 1994-1995.

I will appreciate any help identifying the source of this discrepancy,

thanks a lot,

Mine Senses

msenses@jhu.edu

The Census Bureau updated their sub-state geographic identifiers in September 1995, excluding them completely from the June-August 1995 period. As a result, they intentionally prevented the matching of households from samples with old geographic identifiers to samples with the new geographic identifiers. Specifically, it should be possible to link within the following three time periods, but not across:

  1. Prior to June 1995
  2. June-August 1995
  3. After August 1995

This should explain why you are having difficulty linking respondents in your sample from 1994 to 1995 and 1995 to 1996.

Hope this helps.