NA Values in IPUMS MEPS PSUANN column

Hey there,

I am attempting to create a survey object in R using the survey and srvyr packages. When feeding in the analytic variables to the svydesign function however, I receive an error due to NAs in the id variable (PSUANN). When manually searching through my dataset (imported using IPUMS), there are many NAs in the PSUANN variable. Is this to be expected? How can unique ids be missing for observations? Please see my code below:

Loading Necessary packages

library(ipumsr)
library(tidyverse)
library(survey)
library(srvyr)

Loading in the data through ipumsr

ddi ← read_ipums_ddi(“meps_00007.xml”)
data ← read_ipums_micro(ddi)

Creating new weights variable because I am pooling several years

pooled_years ← length(c(unique(data$YEAR)))
data$normalized_weights ← data$PERWEIGHT / pooled_years

Attempting to create survey object

svy ← svydesign(ids= ~PSUANN,
strata = ~STRATANN,
weights = ~normalized_weights,
data = data,
nest = TRUE)

I believe the issue you are encountering is caused by the specific extract structure you are using; this page provides a useful guide to the types of data structures that are available.

Each IPUMS MEPS variable is assigned to a particular record type (see RECTYPE) such as person, round, event, condition, medication-round, and medication-round fill. For example, demographics such as AGE and SEX are reported on person records, while information on medical visits (e.g. EVENTYPE).is reported on event records. However, demographic variables will not appear on event records; they will instead be reported as N/A. One option for users (the default option) is to request data that is rectangular on the person record. In this case, only variables recorded on the person record are included in the extract and each observation is an individual in a particular year that they were sampled. However, users looking to use other types of records such as event and condition records, can only obtain this data in a hierarchical format. In this format, each observation is not a person, but a particular record (e.g. event, condition, medication-round, etc.) for that person. A person record may be followed by multiple event records in the data. Since PSUANN is recorded on the person record, it will be reported as N/A on these event records and all other types of records (i.e. RECTYPE ≠ P).

1 Like

Hey there,

Thank you so much for the detailed reply. I understand now that each row of this dataset corresponds to an event rather than an individual. As I am interested in looking at medication fills, costs, and spending, I believe I need to work with a hierarchical data format. WIth this in mind, what would be the appropriate id, strata, and weight variables to create the survey object?

Thank you!

The IPUMS MEPS User Guide on Variance Estimation provides specific instructions and code for how to incorporate sample design data into your analysis. Researchers should use PSUANN to represent the impact of clustering (id) and STRATANN to represent the impact of stratification.

The correct IPUMS MEPS weight to use depends on the universe of the variables you are analyzing. Most questions in the MEPS are asked of all respondents and PERWEIGHT is the correct weight to use in analyses of these variables.

However, there are a set of questions, part of the Self-Administered Questionnaire (SAQ) and the new Preventive SAQ (PSAQ), that are administered to a subset of MEPS respondents (see SAQELIG for a description of who is sampled). The SAQ contains questions pertaining to satisfaction with health care, health status, non-specific psychological distress, and the Patient Health Questionnaire. The PSAQ, with one version for females and one for males, collects information about preventive care and contains many questions from the omitted Preventive Care section. Researchers should use SAQWEIGHT when analyzing data including SAQ or PSAQ variables (including when SAQ questions are analyzed in conjunction with other questions asked of all respondents).

Additionally, there is a self-administered Diabetes Care Survey (DCS) which includes a subset of respondents who reported ever being told by a doctor or other health professional that they have diabetes. Researchers should use DIABWEIGHT for analyses that include these variables.

If you’re unsure of which weight to use, you should consult the weights tab for your variables of interest, which specifies the weight that should be used with that variable in each year (e.g., the weights tab for RXNAME states that users should use PERWEIGHT when analyzing this variable on its own).