Amtrak Web Scraper

Created by Eric Englin

This is a web scraper to download PDFs on Amtrak usage at stations across the U.S. from 2012-2018. Raw PDFs were downloaded at the station level and the state level (both csv’s included here). From there, the program will scrape the graph for the last 6 years of ridership from each PDF document. Some stations have discontinued services in the last 6 years, and the program will flag these stations. Final data is posted as data visuals.

Data Visualization

Interactive visual of final dataset overlaid in U.S. is shown below (created in Python using Bokeh package):



A version created in ArcGIS Pro that overlays the Amtrak rail network: Amtrak Ridership

Data Collection

Data was collected from railpassengers.org where similar fact sheets are available for all stations and states. In these fact sheets, there are graphs showing passenger arrivals and departures from 2012-2018. The jupyter notebooks in the data collection folder show the process for downloading all of the PDFs simultaneously and scraping the data from the PDFs using the Tabula package. The final products are the state and station level CSV files.

Aberdeen Example

An example pdf is shown below for Aberdeen, MD. Website with all raw pdf’s: https://www.railpassengers.org/all-aboard/tools-info/ridership-statistics/

Screenshot