author: Matt Nelson title: The decline of patrilineal kin propinquity in the United States, 1790–1940 **********DATA ACCESS OVERVIEW********** **To Download the Data** To request restricted versions of the IPUMS data, interested users will need their institution to submit a data license application. Interested users should contact ipumsres@umn.edu for more information. The data is currently free of charge, but may change in the future. The license requires users to submit a data security plan on protecting the data, a research project application, and a signed researcher agreement form. County boundaries can be downloaded from NHGIS. The 2000 Tiger Lines 1790-1940 (excluding 1890) were downloaded and loaded into ArcGIS. This data was combinged in ArcGIS using the variable COUNTYNHG (when available) in IPUMS-USA data and GISJOIN2 in NHGIS data. NOTE: The data is continually updated. Any data downloads after May 2020 may not have the exact same results because of updates to the data by IPUMS. For approved restricted data users, the original data used in this analysis will be preserved by IPUMS to recreate as necessary. **Programs and versions used for analysis** Stata 16.0 (syntax attached) ArcMap 10.4.1 (used to create Figures 8-10) Excel 16.0.4993.1002 (used to create remaining Tables and Figures, attached) **********FOLDER STRUCTURE********** dem_res data 1790-Data for 1790 is stored here 1800-Data for 1800 is stored here 1810-Data for 1810 is stored here 1820-Data for 1820 is stored here 1830-Data for 1830 is stored here 1840-Data for 1840 is stored here 1850-Data for 1850 is stored here 1860-Data for 1860 is stored here 1870-Data for 1870 is stored here 1880-Data for 1880 is stored here 1900-Data for 1900 is stored here 1910-Data for 1910 is stored here 1920-Data for 1920 is stored here 1930-Data for 1930 is stored here 1940-Data for 1940 is stored here logs-Any logs written out are stored here shapefiles 1790-Shapefile for 1790 is stored here 1800-Shapefile for 1800 is stored here 1810-Shapefile for 1810 is stored here 1820-Shapefile for 1820 is stored here 1830-Shapefile for 1830 is stored here 1840-Shapefile for 1840 is stored here 1850-Shapefile for 1850 is stored here 1860-Shapefile for 1860 is stored here 1870-Shapefile for 1870 is stored here 1880-Shapefile for 1880 is stored here 1900-Shapefile for 1900 is stored here 1910-Shapefile for 1910 is stored here 1920-Shapefile for 1920 is stored here 1930-Shapefile for 1930 is stored here 1940-Shapefile for 1940 is stored here syntax 1-read_data-Syntax files to read in data stored here 2-create_surname_links-Syntax files to create kin links stored here 3-nhgisjoin-Syntax files to create NHGIS join code stored here 4-living_arrangements-Syntax file to create living arrangements for elderly persons stored here 5-pop_density-Syntax file to create county population density stored here 6-standardize-Syntax file to standardize all files stored here 7-imp_files-Syntax files to create impute files stored here 8-adj_rates-Syntax files to adjust kin propinquity rates stored here states-This folder stores state specific syntax files that are read in from the file 8-adj_rates_cs.do 9-analysis-Syntax files to analyze data to create tables and figures stored here tables-Excel file and ArcGIS file for creating tables/figure stored here temp_data-Tenporary storage for logs, data, etc **********DATA********** **Samples** 1790-419,317 households 1800-542,070 households 1810-829,388 households 1820-1,238,140 households 1830-1,824,028 households 1840-2,585,889 households 1850-19,443,785 persons 1860-26,895,337 persons 1870-37,643,496 persons 1880-49,020,953 persons 1900-73,779,794 persons 1910-89,299,714 persons 1920-101,337,837 persons 1930-118,750,570 persons 1940-128,250,989 persons **Universe** All non-GQ households **Variable Documentation** Full documentation for the following variables is avaialble on the IPUMS-USA website. These are the variables required to create the kin propinquity measure, 1790-1840 imageid-Image identifier, used to sort the data, only available in the restricted version ycord-Y-coordinate, used to sort the data, only available in the restricted version pid-Ancestry identifier, used to sort the data, only available in the restricted version fullstate-State string, only available in the restricted version county-County string, only available in the restricted version township-Township string, only available in the restricted version city-City string, only available in the restricted version surname-Surname of head of household, only available in the restricted version 1850-1940 serial-Serial number for each household. While the serial number for each household is unique by year, it can change between versions of the data. pernum-Person number within household namelast-Surname of indviduals. This variable is only available in the restricted IPUMS data. race-Race of person stateicp-State ICP Codes. Can also use STATEFIP, but the syntax in this file uses STATEICP. countyicp-County ICP Codes. Can also use COUNTYFIP, but the syntax in this file uses COUNTYICP. enumdist, source variables-Identifies the enumeration district (1880-1940) or the minor civil division (1850-1870). The source variables are only available in the restricted IPUMS data. Source variables are unique to each year. For example, US1850C_0043 is the township string, however, US1860C_0043 is not. ENUMDIST is used for 1880-1940, but the source variables were used for 1850-1870. age-Age sursim-Surname Similarity, indicates individuals who share the same surname within a household. These variables (and variables above) were used in the analysis of the patterns of kin propinquity. urban-Urban status. bpl-Birthplace fbpl-Father's Birthplace mbpl-Mother's Birthplace occ1950-1950 Occupational Code countynhg-identifier used to merge the Census data to the county-level GIS files. histid-unique identifier, can be used to identify cases should the data ever change. relate-Relationship to Head momloc-Mother's location within household poploc-Father's location within household The following variables were created by the author (based on the standardized version of the data in file 6_year.dta); ed_group-created from stateicp, county, ed ed_name_group_count_f_a-Count of persons within enumeration district that had the same surname by race ed_totpop_f_a-Count of persons within enumeration district by race kinship_serial_1a-Distance above household between matching surnames for all individuals kinship_serial_2a-Distance below household betwen matching surnames for all indviduals kinship_serial_1b-Distance above household between matching surnames for all heads of household kinship_serial_2b-Distance below household between matching surnames for all heads of household ed_name_group_count_f_b-Count of heads of household within enumeration district that had the same surname by race ed_totpop_f_b-Count of heads of household within enumeration district by race namelast_universe-Variable indicating whether a surname is illegible and should not have a probability calculated sursim_head_pernum-Variable indicating the reference person for a family sursim_head_age-Variable indicating the age of the reference person for a family from_begin-Number of households a family is from the beginning of the enumeraiton district from_end-Number of households a family is from the end of the enumeration district age_link_1a-Age of matching reference person above household age_link_2a-Age of matching reference person below household pop_density-County population density urbtype-Recoding of urban, including extra code for farm families living in rural areas intergeneration-An indicator for elderly persons living with adult children generations-Variable indicating the number of generations elderly person lives in extend-Variable indicating elderly persons living with extended kin urban_rate-Percentage of county population living within a 1930 Census Urban Area C_a_ed_group-The number of other same surname families an average family had within an enumeration district based on Smith, 1989. common_rate_a_ed_group-% of families within an enumeration district that have the most common surname within said enumeration district distribution_a_ed_group-The distribution of surnames for a particular familiy (Frs/Trs) **********SYNTAX FILES********** Because of the size of the files, the author strongly recommends analyzing and running data one year at a time. A "-" indicates notes on what the syntax file does A "*" indicates data files that are created from the particular syntax file run_all.do will run all of the syntax files. This program can take several weeks to run everything from the beginning. Stata Syntax Files 1-Read files-Read in files using IPUMS syntax, select universes, and create variables -Reads in data from IPUMS. Each file is year specific and titled 1-read_data_year.do -WReformats data -Selects in-universe persons -Creates ED-level surname and population counts -Creates 5 year age groups -Creates urban and family farm indicator -Creates foreign-born generation indicator -For 1790-1840, uses data created by author (nhgisjoin.dta and year_imp.dta) to correctly map ICPSR geography codes to the data *Creates data file 1_year.dta 2-Create surname links-Create Kin Propinquity measure -Creates surname links within enumeration district using 1_year data. To speed up processing, the program selects a state, then selects an enumeration district. This program can take anywhere from less than a hour to a couple of days depending on the year that is running and the computing environment the syntax is run in. -The two files in this folder run 1790-1840 and 1850-1940 separately. *Creates data file 2_year.dta 3-Create NHGIS-Create join code for Census data to NHGIS (specifically 1790-1840, 1900 & 1940, COUNTYNHG can be used for other years) using 2_year data. -3-gisjoin2.do creates GISJOIN2 codes for 1850-1940 -3-icp_codes_1790-1840.do creates GISJOIN2 codes for 1790-1840 *Creates data file 3_year.dta 4-Living Arrangments-Create indicators for complex families (three generation, extended, and with adult children) using 3_year data. *Creates data file 4_year.dta 5-Pop Density-Creates the county population densities for each year -Uses NHGIS data from ArcGIS (year_area.xls) to merge county areas to the microdata using 4_year data. *Creates data file 5_year.dta 6-Standardize-Standardize 1790-1840 & 1850-1940 files for analysis using 4_year data for 1850-1940 and 3_year data for 1790-1840. -For 1850-1940, this is the data file users should use to analyze kin propinquity -For 1790-1840, data undergoes additional steps *Creates data file 6_year.dta. 7-Impute files-Create files used to adjust bad data for 1790-1840 using 6_year data. -Creates files for 1790-1840, and 1850 which is used to impute data for the early years in some cases. Each file is year specific named 7-imp_files_year.do *Creates data file 7_year.dta 8-Adj Rates-Adjust rates for bad data for 1790-1840 using 7_year data. -The file 8-adj_rates_ss.do adjusts rates at the state-level -The file 8-adj_rates_cs.do adjust rates at the county-level. -The file 8-adj_rates_ss.do MUST be run first *8-adj_rates_ss.do creates the file kinship_ss.dta in the dem_res folder *8-adj_rates_cs.do creates the file kinship_cs.dta in the dem_res folder 9-Analysis-syntax for recreating tables and figures using 6_year data for 1850-1940 and 8_year data for 1790-1840. -Each .do file creates the figure/table/numbers for the particular table or figure. *figure_8-10.do creates data files 9_year.xlsx to merge into ArcGIS for Figures 8-10 **********TABLES********** Excel File This excel file contains the results used to create the tables and figures in the paper. Each tab is named for the specific table/figure in the paper. Figures 8-10 are not included as they were produced in ArcGIS using the GUI interface. To produce Figures 8-10- 1. Load NHGIS shapefiles for 1790, 1820, 1850, 1880, 1910, and 1940 into GIS software. 2. Joinby GISJOIN2 in shapefiles and GISJOIN2 in Excel County files for each year. 3. Create Figure in Smybology, selecting "Quantities-Graduated Colors". The value should be kinrate, with 5 classes. 4. For the five bins, the first bin is limited to 0, each bin is then defined as a max of 0.199999, 0.399999, 0.599999, and 1 respectively. 5. The colors were based on colorbrewer2.org, a sequential 4-class OrRd scheme. 6. Each figure is exported (with a title and legend) as a .pdf and combined to create Figures 8-10.