Status:Closed    Asked:Jun 24, 2018 - 01:02 AM

Data and query provenance -- i.e., reliably recreating data queries for replication

Thanks for the effort in putting IPUMS together. It is a great resource for the research community.

Our current work utilizes data from a variety of IPUMS data sources, including data derived from the full count (100% sample) census datasets.

We would like to provide readers of our paper with a convenient means of recreating our results directly from the raw data.

Currently, we provide a README along with our code that guides the end-user with stepping through the IPUMS interface to recreate our data extracts. However, this is a somewhat tedious process that introduces the potential for errors/confusion.

Question: Is there a means of recreating a query based on a script (for example, uploading a YAML file) and/or recreating a query based on a stable URL? Ideally, we would be able to provide researchers aiming to replicate our results with a single script, control file, and/or URL that they could then use to download the data from IPUMS.

Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


This is a neat idea and a functionality we could probably think about adding someday. At the present time, however, the only way to gain access to IPUMS data is through the online data extract system.


Jun 25, 2018 - 09:05 AM

Report it


FTOTVAL description
When updating an extract, how do I keep all the variables I previously select...
Is it possible to analyze data online for MEPS? I do not see this option on t...
Is there a more detailed key for variables?
AESC 1990-2017 download with year = 0 observations
Matching Counties name to PUMA
Login   |   Register

Recently Active Members

View More »

Share |