Data Standards
File Names
All final .csv datasets are named using the following convention:
[Theme Abbreviation][2-digit number]_[Spatial Scale].csv
For example, the Policy theme dataset on Prison Incarceration Rates (PS01) at the county-level is PS01_C.csv. The same dataset at the state level is PS01_S.csv, at the tract-level would be PS01_T.csv, and at the zip code level would be PS01_Z.csv.
Theme Abbreviations:
- Policy: PS
- Health: Health*, Access*
- Demographic: DS
- Economic: EC
- Physical Environment: BE
- COVID-19: COVID
* Variables labeled “Health” include: Drug-Related Death rate, Hepatitis C, Physicians. Variables labeled “Access” include: Access to MOUDs, Health Centers, Hospitals, Mental Health Providers, Pharmacies, Substance Use Treatment Facilities, Opioid Treatment Programs.
Spatial Scales:
- Tract: T
- Zip/ZCTA: Z
- County: C
- State: S
Geographic Identifiers
All datasets have geographic identifiers included as a variable. We use the following labeling convention for each spatial scale.
Variable | Variable ID | Description |
---|---|---|
State | STATEFP | 2-digit State FIPS code |
County | COUNTYFP | 5-digit County FIPS code (state + county) |
ZIP Code/ZCTA | ZCTA | 5-digit assigned ZCTA |
Census Tract | GEOID | 11-digit unique tract ID (state + county + tract) |
Data Formatting
Watch for leading zeros. Some geographic identifiers for states, counties, zip codes, and tracts start with “0” or “00”; i.e. leading zeros. However, .csv and other text file formats drop leading zeros automatically upon opening. This means that a state FIPS code of “02” becomes “2”, a county code of “02004” becomes “2004”, a zip code of “07436” becomes “7436”, etc. If you are merging .csvs with any other data by their geographic identifier, you will need to add in the leading zeros (or conversely, drop the leading zeros in the other file) so that they match. This is particularly important when you are trying to merge shapefiles, such as the geographic boundary files, with the .csv files.
Most variable names are no more than 10 characters (with some exceptions) for ease of data wrangling with shapefiles and GIS software. Some variable names are therefore shortened or abbreviated from the source data.
Numeric data are rounded to the nearest tenth (two decimal places).
Missing data are represented as “NA” or empty, depending on the language or platform you are working with.These should not be mistaken for or confused with the numeric “0”.
Guidelines for Contributing
If you are interested in contributing to the OEPS, please keep in mind the following:
- Variables names should be no more than 10 characters
- Numeric observations should be rounded to the nearest tenth (two decimal places)
- Remove any index columns
- Remove quotations marks, commas, or other character punctuation
- Code missing as unavailable data as NA or empty