An Introduction to Web Scraping with Python

Web scraping is a way to collect information from websites using code. It can be especially useful when working with data that isn’t easily downloadable. There are several approaches and tools for web scraping—this workshop will focus on one of them: Python’s Beautiful Soup package. Python is an open-source language, so with the right setup, anyone can use this tool.

Web scraping can support different stages of the research data lifecycle, including the planning phase (e.g., identifying available online data) and the active data collection phase. This workshop is intended for those who are new to web scraping and want to explore how it can be used in a research context.

The session is hosted by Cornell University Library’s Research Data & Open Scholarship team and is part of the Data Den workshop series.

Access the Web scraping & API Binder notebook

Access the Instructor, Jacob Grippin's, GitHub repository with all of the web scraping & API materials.

Access Additional Resources for Web Scraping and API's.

This guide was created in 2025 by Gabby Evergreen and Lencia McKee and is shared under a Creative Commons CC BY 4.0 license.

Research Data and Open Scholarship: Data Den

An Introduction to Web Scraping with Python