Question

Status:Closed    Asked:Sep 22, 2015 - 08:56 AM

Duplicate individual ids of the Vietnam 2009 census?

I'm working with 1989, 99, 2009 Vietnam data.

I've created household id's and personal id's.

gen double hhid=sample*10^8+serial


gen double pid=hhid*100+pernum

Then, I get more than 1 million observations that are not unique observations for 2009.


. bys pid: gen obs=_N

. tab obs

obs | Freq. Percent Cum.

------------+-----------------------------------

1 | 17,376,097 90.63 90.63

2 | 2 0.00 90.63

3 | 1,796,643 9.37 100.00

43 | 43 0.00 100.00

------------+-----------------------------------

Total | 19,172,785 100.00

. tab year if obs==3

Year | Freq. Percent Cum.

------------+-----------------------------------

2009 | 1,796,643 100.00 100.00

------------+-----------------------------------


Total | 1,796,643 100.00


Do you have an idea of what may be happening?


Thank you!


 
Do you have the same question? Follow this Question
 

Staff Answer

avatar

Tim_Moreland

Staff

Your line "gen double hhid=sample*10^8+serial" does not create sufficient space to include both sample and serial in the same variable without overlap. Instead, you should multiply sample by 10^10. When I make this change to your code, I get 19,172,742 unique values for pid across the three Vietnam census samples. In other words, there are zero duplicate individual IDs.


Hope this helps.

 

Sep 22, 2015 - 03:38 PM

0
0
Report it

OTHER QUESTIONS NEEDING ANSWERS

negative values of wtsupp 1975 and before
Why are not all the "not in the labour force" categories filled for the EMPST...
I am getting strange numbers for NIU persons in the sixties, especially 1962-...
Health Insurance Unit in ACS versus MEPS
Login   |   Register

Recently Active Members

View More »

Share |