Question

Status:Closed    Asked:Mar 30, 2018 - 08:23 AM

How to I read in select columns from IPUMs data with ipumsr?

I'm trying to read in only select columns from an IPUMs USA dataset with the ipumsr R package. The following code is not working. It reads in all the columns in the dataset. Any ideas?


test <- read_ipums_micro(ipumsi_ddi_file, vars = c(YEAR,MET2013,PERWT), verbose = FALSE,n_max=10)

 
Do you have the same question? Follow this Question
 

Staff Answer

The code you have should work. I would just make sure that "ipumsi_ddi_file" corresponds to the ddi file of your IPUMS USA extract. If the issue persists, you could use the following to create a subset of the full data set with your columns of interest:

data <- subset(test, select = c(YEAR,MET2013,PERWT)

Can you check to see if the following works for reading in a subset of the example extract included in the ipumsr package?

library(ipumsr)
read_ipums_micro(
ipums_example("cps_00006.xml"),
vars = c(YEAR, SERIAL),
n_max = 10
)

If not, you should try reinstalling ipumsr from cran using: install.packages('ipumsr')

 

Apr 04, 2018 - 08:34 AM

0
0
Report it

Answers

Hi Michelle,


Your code works fine for me so the issue seems to be the particular DDI file that I am using (ie. test below has 55 columns while test2 has 2 columns). I am trying to limit the number of columns for memory considerations on the initial load, otherwise a select or subset statment like you suggested would work fine.


The other thing that would be helpful (given memory limitations) is if I could read in only a specific selection of rows data at a time (such as one metropolitan area or state) but I'm not sure how to do that within ipumsr functions.



test <- read_ipums_micro(ipumsi_ddi_file,
vars = c(YEAR, SERIAL),
n_max = 10)

test2 <-
read_ipums_micro(
ipums_example("cps_00006.xml"),
vars = c(YEAR, SERIAL),
n_max = 10
)

 

Apr 04, 2018 - 01:40 PM

0
0
Report it

Hi there,


This appears to be a bug in ipumsr - currently you cannot select columns from a csv file. As an immediate fix, you could change your extract to be a fixed width (.dat) file instead of a csv.


Otherwise, I hope to fix this in ipumsr soon. You can track progress here: https://github.com/mnpopcenter/ipumsr...


As for subsetting large extracts by row, you can use the Select Cases feature in our extract engine. I would like to add better support into ipumsr, but that is probably further off in the future.

https://cps.ipums.org/cps-action/faq#...


Thanks for reporting!

Greg

 

Apr 04, 2018 - 02:02 PM

0
0
Report it

Hi Greg,


Thanks for making a ticket and for the suggestion on using a .dat file. Also, to clarify the reason why I was interested in reading in a selection of data from the data extract using ipumsr instead of requesting a smaller data extract is that I am analyzing ~40 metropolitan statistical areas (MSA) so it would be a bit of a pain to request them in separate data extracts.


However, if I had the ability to download the entire ~40 MSA dataset but only read in one MSA at a time into memory with ipumsr, then I could easily iterate through this dataset and not run up against memory limitations on a laptop. For the time being I think I will just use a smaller data extract for testing and then run the entire ~40 MSA dataset on a desktop with more memory when I have my code in a good state.


Thanks for your help, Greg and Michelle.


Best,

Jesse

 

Apr 04, 2018 - 02:12 PM

0
0
Report it

OTHER QUESTIONS NEEDING ANSWERS

R package 'ipums' can't read extract. Needs .XML file? No option for that???
Login   |   Register

Recently Active Members

View More »

Share |