# finding the mean of family income by education for 30-39 year olds in the March CPS from 1970 to 2014

Hi there! I'm trying to plot average family income (ftotval) by educational attainment (educ, recoded slightly, into fewer categories), for families with people between the age of 30 and 39, inclusive. I had a couple questions and would be eternally grateful for some pointers.

1. First, I try to reduce the data set so that there's one 30-39 year-old person per family. Then I find the mean family earnings of those remaining folks by education, weighted using wtsupp. Is that reasonable? (Please see stata code below)

2. I have this weird jump in 1992 (see graph below, which is adjusted for inflation--but the inflation adjustment is not what created the jump; I checked). I think this is because the educ variable is created from a pre-1992 education variable for highest degree attained (HIGRADE), and a post-1992 variable that does something similar, but is still slightly different (EDUC99). Is that what's throwing my graph? If so, is there anyway to solve this? I have trouble believing that bachelor's degree holders earn less now than in 1970, in real terms.

Any help would be much appreciated. Thanks!

Here is my stata code:

/*

This is a do file that tabulates median family income of adults aged 30 to 39 by education level.

The data come from IPUMS CPS.

*/

use"cps_00032.dta",clear

sort year serial famsize marst

keep if age <40 & age > 29

drop if relate == 201 //here we're dropping spouses, since we want to weight by family for our tabulation

//and we want only one person within each family to do //that

sort year serial famunit ftotval

by year serial famunit ftotval: gen dup = cond(_N==1,0,_n)//create indicator of ///duplicate fam. members

drop if dup > 1//drop all the duplicates

*br if year == 2001 & serial == 21366 //just looking at person who had family member //in the dataset before

//Make education variable that's simpler

gen educ_simple = "NA"

replace educ_simple = "aLessHS" if educ < 73 //Those without HS diplomas

replace educ_simple = "bHS" if educ == 73 //HS degree

replace educ_simple = "cSomeCollor2yrDeg" if educ > 73 & educ < 111 //some coll. or //associate's degree

replace educ_simple = "dBach" if educ >= 111 & educ <= 122

replace educ_simple = "eAdvDeg" if educ >= 123 & educ <= 125

//I'm counting 5 yrs of college and 6+ yrs

//of college as "bachelor's degree.

//just put the a, b, c, d, e prefixes so they appear in the right

//order when tabbed.

drop if year < 1970

set more off

sort year educ_simple

forvalues i=1970(1)2014 {

qui summarize ftotval [aw=wtsupp] if educ_simple == "eAdvDeg" & year == `i'

generate avg_advDeg_`i' = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == "dBach" & year == `i'

generate avg_bach_`i' = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == "cSomeCollor2yrDeg" & year == `i'

generate avg_somecoll_`i' = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == "bHS" & year == `i'

generate avg_hs_`i' = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == "aLessHS" & year == `i'

generate avg_ltHS_`i' = r(mean)

}

//2004 has different weights, so we have to put that in separately. We replace what we //had with the new averages.

//these use the person weights for 2004

qui summarize ftotval [aw=PERWT04] if educ_simple == "eAdvDeg" & year == 2004

replace avg_advDeg_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == "dBach" & year == 2004

replace avg_bach_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == "cSomeCollor2yrDeg" & year == 2004

replace avg_somecoll_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == "bHS" & year == 2004

replace avg_hs_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == "aLessHS" & year == 2004

replace avg_ltHS_2004 = r(mean)

br

keep if _n == 1

keep avg*

gen id = 1

reshape long avg_ltHS_ avg_hs_ avg_somecoll_ avg_bach_ avg_advDeg_, i(id) j(year)

br