Status:Closed    Asked:Feb 21, 2014 - 05:52 PM

unique person id variable?

I created a person id variable by concatenating the year, datanum, serial, and pernum. I am using the 2010 3 year sample for the health insurance coveragel variable. However, the person id variable doesn not uniquely identify the observations, of the 9 million obs, there are 1,268 person ids with duplicate values.

I also cannot just arbitraily drop the duplicates, becuase it appears that some are located in different states.

Please advise.

Do you have the same question? Follow this Question

Staff Answer




When generating a unique person id by concatenating variables, it is important to include leading zeros on the variables being concatenated. One method for insuring the inclusion of leading zeros is to generate new variables that are formatted to include leading zeros. In Stata it is easy to concatenate strings so I also convert the variables to string. The Stata code that I use looks like this:

gen str4 stryear = string(year, "%04.0f")
gen str2 strdatanum = string(datanum, "%02.0f")
gen str8 strserial = string(serial, "%08.0f")
gen str4 strpernum = string(pernum, "%04.0f")

gen uniqueid = stryear + strdatanum + strserial + strpernum

Using this code I was able to uniquely identify all persons in the 2010 3yr file.

I hope this helps.


Feb 24, 2014 - 03:56 PM

Report it


Great, thank you.


Feb 24, 2014 - 05:40 PM

Report it


negative values of wtsupp 1975 and before
Why are not all the "not in the labour force" categories filled for the EMPST...
I am getting strange numbers for NIU persons in the sixties, especially 1962-...
Health Insurance Unit in ACS versus MEPS
Login   |   Register

Recently Active Members

View More »

Share |