Status:Closed    Asked:Apr 10, 2018 - 05:33 PM

Proper use of WTFINL for longitudinal analysis

This is a more complete, detailed version of an earlier post of mine.

I have extracted monthly, county-level data for EMPSTAT and LABFORCE from 2010-2017. I am using CPSIDP to help track an individual's status from month-to-month, where applicable. I create a dummy variable if that individual went from "not in labor force" to "employed" in adjacent months, and sum these dummy variables by year to produce unweighted labor force flows.

I am wondering how to properly incorporate WTFINL into this procedure. From another post, it was suggested to divide the sample weight in each year by the number of sample years in the panel. Can these adjusted weights then be applied to the dummy variables I mentioned above? (before summing by year, of course).

For example, I can first sum the adjusted weights by month, for a particular year. Next, I can divide each individual WTFINL (for each CPSIDP) by these monthly totals, for the appropriate month. Finally, I can apply this result (again, for each CPSIDP) to the dummy variables of interest, then sum by year. Any insight into if/how this approach can be improved is greatly appreciated.



(2) (provided in response to a similar, recent post of mine; this was very helpful, and much appreciated!)


Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


In general, there is no real consensus about the proper incorporation of sampling weights, particularly when pooling together samples as you describe. I am not certain I completely understand what you are doing, and if I am misunderstanding in any way feel free to provide more detail, but I can offer the following discussion.

I am not certain what you mean by using "county-level data" and this could influence how you use sampling weights. You could be extracting specific counties or first aggregating labor force statistics by county or something else entirely. In any case, do note that in most IPUMS CPS samples the COUNTY variable is identifiable for only about 45% of the population. The tricky bit is that when pooling CPS monthly samples, the pooled sample will include multiple observations for some (but not all) households. In principle issues relating to this detail can be avoided by limiting analysis to only those with MISH==1. However, it sounds like you want to exploit the fact that there are repeated observations of households.

If I understand correctly, you are interested in identifying individuals who transition from not in the labor force in one month to being employed in the next month. This being the case, you are limiting your sample to individuals with at least two observations in consecutive months. (Note that this means your sample is limited to individuals with consecutive observations with MISH==1-4 OR MISH 5-8. Since there is an eight month gap between MISH 4 and 5, these individuals do not meet your criteria.) Additionally (unless you are explicitly correcting for this detail) since you are summing by year, the consecutive months are restricted to being within any one year. Said differently, individuals not in the labor force in December and employed in January are likely dropped from your sample. It is these restrictions on your sample that should be accounted for by the sample weight.

Finally, couple suggestions: Perhaps first limit each monthly sample to only individuals with MISH==1-4. Then create your "transition" dummy variables, as you describe, and sum up the number of transitions for each individual (via CPSIDP). Next keep only one observation per CPSIDP and apply the sampling weights (WTFINL). If you happen to be first aggregating the data up to the county level, then you should apply sampling weights within this aggregation. Finally, a reasonable way to check if you've applied the sampling weights "correctly" is to calculate the total population size, after you've applied the sampling weights. If the population is roughly equivalent to the real population of the US, then you are on the right track.

I also encourage you to look into additional discussion on this topic here and here for more information.


Apr 11, 2018 - 12:37 PM

Report it


Does using the INCLUGH variable over time requires correcting for the VERIFY=...
Does CLASSWKR in ASEC regarding self-employment (code 10) include incorporate...
I am trying to cross age, urban, and race. Which weight am I supposed to use?
Regarding Top-Coded and Replacement Values for 1990 HHINCOME
What happened to the tobacco use questions? They seem to have disappeared.
Do you have a plan to release earlier Contingent worker supplement data (1995...
Login   |   Register

Recently Active Members

View More »

Share |