Status:Closed    Asked:May 05, 2018 - 11:09 PM

Stata version, .do files and file metadata conversion issues

Do the .do files apply equally to all versions of of Stata? If not, in what version of Stata are the files supplied?

Also, I am a little confused about the format of the Stata data, which I am converting to R. Because R holds its data in RAM, I need to read larger files in in chunks (for transfer to a database). Unfortunately, R's flagship Stata -> R conversion program, haven, does not have a setting to specify reading less than the entire file (though there are some other conversion packages I have not explored yet, at least through Stata 13). So I was thinking that I would use a line-reading utility to read the first n lines, use haven to convert those line with all the right formatting and metadata, and then bind subsquent blocks data using the converted top-of-file with the full collumn format, including all the levels information. But then it appeard to me that all this information is actually in the .do file, so that would not work -- I'd have to extract it from the .do file somehow.

However, for smaller files, haven translation seems to pick up everything, including the header info. I'm not quite sure how that can be without some sort of metadata in the file similar to that in the .do file. Is the metadata information such as the data type and the levels information also encoded in the first however many rows? If so, do you have any information about the metadata formatting in those rows? If so, I'd like any information you have on the formatting of the metadata in those rows, and I especially want to know how the metadata is divided from the data -- by assigning some number of rows (and if so how many)

Do you have the same question? Follow this Question

Staff Answer


Jeff Bloem


The .do files should run in any version of Stata. Basically, these command files take the (decompressed) fixed-width .dat files and convert them into a data file, such as a Stata .dta file. As part of this process, the command files label variables and define value labels. Since you are trying to read IPUMS data into R, it sounds like you may benefit from using the ipumsr package. This package reads in data from an IPUMS extract into R along with all of the associated metadata, such as variable labels and value labels.


May 07, 2018 - 10:02 AM

Report it


Wow! Great answer. I'll check it out immediately.


Jun 13, 2018 - 07:57 PM

Report it


Do SAS and STATA "command files" transform any "Fixed Width Text Files/Data" ...
Are the variable names in the Stata .dta file upper case or lower case?
i got my data file with zinrar file, i extract the uas_0003.dat file to open...
Is it possible to download data in STATA 13 format? It looks like it's now ST...
Login   |   Register

Recently Active Members

View More »

Share |