Volume 40 - Article 40 | Pages 1153–1166
IRS county-to-county migration data, 1990‒2010
|Date received:||24 Aug 2018|
|Date published:||07 May 2019|
|Keywords:||database, Internal Revenue Service (IRS), migration, R|
|Updated Items:||The text of the publication is not changed. The original data file contained two oddities that have been fixed in the csv file now available for download: 1) Some users noticed some odd activity for 1998 inflows for the states of WV and PA. For only these states, only this year, and only for in-migrants, the IRS did not explicitly declare the state and county variables as characters, which led to them being imported as 421 (for state =42, county = 001) instead of 42001. These are now corrected in the csv. 2) 1996 contained missing states. The IRS capitalized some of the raw files (ie, “CO967DEO.XLS” rather than “co967dco.xls”), and they were not imported in the file. This only affected 1996, out-migration, for a subset of states. The authors have fixed this error in the new csv file.|
|Additional files:||demographic-research.40-40 (zip file, 3 MB)|
Background: The county-to-county migration data of the Internal Revenue Service’s (IRS) is an incredible resource for understanding migration in the United States. Produced annually since 1990 in conjunction with the US Census Bureau, the IRS migration data represents 95% to 98% of the tax-ﬁling universe and their dependents, making the IRS migration data one of the largest sources of migration data. However, any analysis using the IRS migration data must process at least seven legacy formats of this public data across more than 2000 data ﬁles – a serious burden for migration scholars.
Objective: To produce a single, ﬂat data ﬁle containing complete county-to-county IRS migration ﬂow data and to make the computer code to process the migration data freely available.
Methods: This paper uses R to process more than 2,000 IRS migration ﬁles into a single, ﬂat data ﬁle for use in migration research.
Contribution: To encourage and facilitate the use of this data, we provide a single, standardized, ﬂat data ﬁle containing county-to-county one-year migration ﬂows for the period 1990–2010 (containing 163,883 dyadic county pairs resulting in 3.2 million county-year observations totaling over 343 million migrants) and provide the full R script to download, process, and ﬂatten the IRS migration data.
Most recent similar articles in Demographic Research