Created by Eric Englin
This is a web scraper to download PDFs on Amtrak usage at stations across the U.S. from 2012-2018. Raw PDFs were downloaded at the station level and the state level (both csv’s included here). From there, the program will scrape the graph for the last 6 years of ridership from each PDF document. Some stations have discontinued services in the last 6 years, and the program will flag these stations. Final data is posted as data visuals.
Data was collected from railpassengers.org where similar fact sheets are available for all stations and states. In these fact sheets, there are graphs showing passenger arrivals and departures from 2012-2018. The jupyter notebooks in the data collection folder show the process for downloading all of the PDFs simultaneously and scraping the data from the PDFs using the Tabula package. The final products are the state and station level CSV files.
An example pdf is shown below for Aberdeen, MD. Website with all raw pdf’s: https://www.railpassengers.org/all-aboard/tools-info/ridership-statistics/