Question

Status:Closed    Asked:May 12, 2017 - 08:19 AM

Which variable do I use as the cluster variable for the India surveys?

I am analysing the India surveys and want to svyset the data on STATA.

I was wondering which variable I should use as the psu/cluster variable? I have seen 'cluster' mentioned for this use but this is not available on the international data.

Many thanks

 
Do you have the same question? Follow this Question
 

Staff Answer

avatar

Jeff Bloem

Staff

Nearly all IPUMS International samples, and all India samples, consider households to be the primary sampling unit (PSU). Therefore, it is usually advised to use the household identifier variable SERIAL for clustering. Note, however, that depending on the type of analysis you are performing, it may be more appropriate to cluster a different levels. For example, if you are aggregating data across regions before performing your analysis, then controlling for inter-region correlation of your outcome variable may be advisable. More details about variance estimation with IPUMS International data are available here. Specific sample characteristics for Indian samples are available here.

 

May 12, 2017 - 09:04 AM

0
0
Report it

Answers

Thank you for the quick response. Sorry for my confusion but the sampling strategy states that it is a multistage design, with the first round selecting rural villages and urban wards. So should an identifier for these not be the PSU? Or was that not available in the data?

 

May 12, 2017 - 09:23 AM

0
0
Report it

That is correct, villages and urban wards are not identifiable in the data. The lowest level of geography for India is the region level. Houshold IDs are available because they do not identify geographic location.

 

May 12, 2017 - 09:26 AM

0
0
Report it

I see, thank you !

 

May 12, 2017 - 09:27 AM

0
0
Report it

OTHER QUESTIONS NEEDING ANSWERS

Why is there no strata variable for India 2009?
Login   |   Register


Share |