What is BeautifulSoup in Python?

BeautifulSoup is a popular Python library for parsing HTML and XML documents. It builds a parse tree that lets you search, navigate, and extract elements from web pages using tag names, attributes, CSS selectors, and more.

How do I install BeautifulSoup?

You can install BeautifulSoup 4 with pip using the command: pip install beautifulsoup4. For most projects you will also install requests for making HTTP calls, and optionally lxml for faster parsing.

How do I parse HTML with BeautifulSoup?

To parse HTML with BeautifulSoup, pass the HTML string (or file contents) to BeautifulSoup(html, "html.parser") or another parser such as "lxml". Then use methods like find, find_all, select, or select_one to locate the elements you need and extract their text or attributes.

Can BeautifulSoup handle dynamic, JavaScript-heavy pages?

BeautifulSoup itself only parses HTML it is given; it does not execute JavaScript. To handle dynamic pages, you can render the page first with a headless browser or use a managed scraping solution that returns fully rendered HTML, and then parse the result with BeautifulSoup.

Products $/GB



Data for AI $/GB



Pricing $0/GB



Docs $/GB



Documentation

Full details on all features, parameters, and integrations, with code samples in every major language.

Residential Proxies Web Scraper API Web Unlocker SERP API

LEARNING HUB

Resource $/GB



EN $/GB



Blog

Tutorials

beautifulsoup-tutorial-2026-parse-html-data-with-python

BeautifulSoup Tutorial 2026: Parse HTML Data With Python

BeautifulSoup Tutorial - Parsing Web Data With Python

Kael Odin

Last updated on

February 28, 2026

12 min read

BeautifulSoup Tutorial: How to Parse Web Data With Python (2026)

HTML pages are everywhere in 2026: product catalogs, job boards, pricing tables, documentation, news sites, and more. If you work with Python, BeautifulSoup is still one of the fastest ways to turn that raw HTML into structured data you can search, analyze, and feed into downstream systems.

This tutorial walks through a complete, copy-paste ready workflow: you’ll start with a small sample HTML file, learn how to parse it with BeautifulSoup, then move on to real HTTP responses, CSS selectors, and exporting data to CSV. Along the way, you’ll see how the same patterns scale to larger web scraping projects powered by managed infrastructure, so you don’t have to maintain brittle scrapers yourself—and how to combine this parser with solid Python basics like those covered in our syntax error and debugging guides.

Key Takeaways:
• Install and configure beautifulsoup4 and requests in a clean Python environment
• Parse a local HTML file and learn the core BeautifulSoup APIs: find, find_all, and select
• Build a practical parser that extracts product-like data and exports it to CSV
• Understand the limits of BeautifulSoup on JavaScript-heavy pages and how to combine it with managed scraping solutions
• Get a quick-reference table of the most common parsing patterns you’ll use daily

1. Setup: Install BeautifulSoup and Requests

We’ll assume you already have Python 3.10+ installed. If you’re on Windows, make sure you checked the “Add Python to PATH” box during installation so commands like python and pip work in your terminal.

Create and activate a virtual environment

python -m venv .venv

# Windows PowerShell
.\.venv\Scripts\Activate.ps1

# macOS / Linux
source .venv/bin/activate

Install BeautifulSoup and Requests

We’ll use beautifulsoup4 for parsing and requests for making HTTP calls. Optionally, you can install lxml for faster parsing:

pip install beautifulsoup4 requests lxml

2. Create a Sample HTML File

To understand the basics of BeautifulSoup, we’ll start with a simple, static HTML snippet representing a product list. Save the following content as sample_products.html in your project directory:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Sample Product List</title>
  </head>
  <body>
    <h1>Top Selling Products</h1>

    <ul id="products">
      <li class="product" data-sku="A100">
        <span class="name">Data Center Proxy Plan</span>
        <span class="price">49.00</span>
        <span class="currency">USD</span>
      </li>
      <li class="product" data-sku="A200">
        <span class="name">Residential Proxy Plan</span>
        <span class="price">99.00</span>
        <span class="currency">USD</span>
      </li>
      <li class="product featured" data-sku="A300">
        <span class="name">Web Scraper API Bundle</span>
        <span class="price">199.00</span>
        <span class="currency">USD</span>
      </li>
    </ul>
  </body>
</html>

This HTML is much simpler than a real e-commerce page, but it’s perfect for learning the core BeautifulSoup patterns.

3. First Steps With BeautifulSoup: Load and Inspect

Create a Python file named beautifulsoup_intro.py and paste the following code. This loads your local HTML file and prints out the top-level tags:

from bs4 import BeautifulSoup

HTML_FILE = "sample_products.html"

with open(HTML_FILE, "r", encoding="utf-8") as f:
    html = f.read()

soup = BeautifulSoup(html, "html.parser")

print("Document title:", soup.title.string)
print("Main heading:", soup.h1.string)

print("\nAll direct children of <body>:")
for child in soup.body.children:
    if getattr(child, "name", None):
        print(" -", child.name)

Run it:

python beautifulsoup_intro.py

You should see output similar to:

Document title: Sample Product List
Main heading: Top Selling Products

All direct children of <body>:
 - h1
 - ul

4. Finding Elements: find, find_all, and select

BeautifulSoup provides several powerful methods for locating elements:

Method	Use Case	Example
`find()`	First match	`soup.find("ul", id="products")`
`find_all()`	All matches	`soup.find_all("li", class_="product")`
`select()`	CSS selectors	`soup.select("ul#products li.product")`

Extract product data into Python dicts

Let’s parse all products into a list of dictionaries. Create parse_products.py:

from bs4 import BeautifulSoup
from pathlib import Path

HTML_FILE = "sample_products.html"

def parse_products(html: str):
  soup = BeautifulSoup(html, "html.parser")
  items = []

  for li in soup.select("ul#products li.product"):
    name_el = li.select_one(".name")
    price_el = li.select_one(".price")
    currency_el = li.select_one(".currency")

    items.append(
      {
        "sku": li.get("data-sku"),
        "name": name_el.get_text(strip=True) if name_el else "",
        "price": float(price_el.get_text(strip=True)) if price_el else None,
        "currency": currency_el.get_text(strip=True) if currency_el else "",
        "featured": "featured" in li.get("class", []),
      }
    )

  return items


def main() -> None:
  html = Path(HTML_FILE).read_text(encoding="utf-8")
  products = parse_products(html)

  print(f"Found {len(products)} products:")
  for p in products:
    print(f" - {p['sku']}: {p['name']} ({p['price']} {p['currency']})"
          + (" [FEATURED]" if p["featured"] else ""))


if __name__ == "__main__":
  main()

Run it and you should get a neatly formatted list of products extracted from your HTML file.

5. Parsing Real HTTP Responses With BeautifulSoup

So far we’ve worked with a local file. In real web scraping projects, you’ll usually fetch HTML over HTTP (using requests or a managed scraper) and then pass the response text to BeautifulSoup.

Here’s a minimal example that fetches https://httpbin.org/html and prints the main heading text:

import requests
from bs4 import BeautifulSoup

URL = "https://httpbin.org/html"

resp = requests.get(URL, timeout=10)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, "html.parser")

title = soup.find("h1")
print("Page heading:", title.get_text(strip=True) if title else "(not found)")

Important: Always review a website’s terms of service and robots.txt before scraping it. For production workloads, you should handle rate limits, retries, and IP rotation responsibly, or use a managed scraping platform to minimize operational and legal risk.

6. Export Parsed Data to CSV

Once you’ve parsed HTML into Python objects, you’ll often want to export that data into CSV for analysis in tools like Excel, Google Sheets, or a data warehouse.

Let’s extend our product parser to write a CSV file using pandas:

from bs4 import BeautifulSoup
from pathlib import Path
import pandas as pd

HTML_FILE = "sample_products.html"

def parse_products(html: str):
  soup = BeautifulSoup(html, "html.parser")
  items = []

  for li in soup.select("ul#products li.product"):
    name_el = li.select_one(".name")
    price_el = li.select_one(".price")
    currency_el = li.select_one(".currency")

    items.append(
      {
        "sku": li.get("data-sku"),
        "name": name_el.get_text(strip=True) if name_el else "",
        "price": float(price_el.get_text(strip=True)) if price_el else None,
        "currency": currency_el.get_text(strip=True) if currency_el else "",
      }
    )

  return items


def main() -> None:
  html = Path(HTML_FILE).read_text(encoding="utf-8")
  products = parse_products(html)

  df = pd.DataFrame(products)
  df.to_csv("products.csv", index=False, encoding="utf-8")
  print("Exported products.csv with", len(df), "rows")


if __name__ == "__main__":
  main()

After running this script, you should see a new products.csv file in your project directory containing the parsed data.

7. CSS Selectors and Advanced Queries

BeautifulSoup’s select() and select_one() methods support a useful subset of CSS selectors. Here are a few patterns you’ll use frequently:

Pattern	Selector	Description
By ID	`soup.select_one("#products")`	Element with id=”products”
By class	`soup.select(".product.featured")`	All elements with class “product” and “featured”
Tag + class	`soup.select("li.product .price")`	All elements with class “price” inside `<li class="product">`
Attribute	`soup.select('li[data-sku="A200"]')`	Product with SKU A200

8. Dynamic Pages and Managed Scraping

BeautifulSoup is perfect for parsing HTML you already have, but it doesn’t execute JavaScript. If your target pages are heavily dynamic (client-side rendering, infinite scroll, complex anti-bot protections), you’ll need an additional layer to render or fetch HTML reliably.

Many teams choose a hybrid approach: use a managed scraping platform to handle JavaScript rendering, IP rotation, and anti-bot logic, then feed the resulting HTML into BeautifulSoup. This separation lets your Python code stay small and focused on parsing and business logic, while the infrastructure concerns are handled elsewhere.

For example, Thordata provides scraping APIs and tools designed to return clean, structured results from complex targets. You can manage your API tokens, monitor usage, and configure scraping jobs in the Thordata Dashboard, while keeping your parsing logic in Python with BeautifulSoup. To see how Thordata’s Python SDK works in practice, check out the open source repository here: Thordata Python SDK.

9. Common BeautifulSoup Mistakes (and How to Avoid Them)

Not checking for missing elements: Directly calling .get_text() on None will raise an exception. Always guard with if el or use helper functions.
Using the wrong parser: If you see odd parsing behavior, try switching from "html.parser" to "lxml" (after installing lxml).
Ignoring encoding: When reading local files or HTTP responses, make sure to use the correct encoding (often UTF-8) to avoid garbled characters.
Scraping dynamic content directly: If elements are rendered by JavaScript, the raw HTML fetched with requests may not include them. Use a headless browser or a managed scraper to get the final HTML.

10. Quick Reference

Goal	BeautifulSoup Pattern
Create soup	`soup = BeautifulSoup(html, "html.parser")`
Find first tag	`soup.find("h1")`
Find all tags	`soup.find_all("li")`
Find by id	`soup.find("ul", id="products")`
Find by class	`soup.find_all("li", class_="product")`
CSS selector	`soup.select("ul#products li.product .price")`
Get text	`element.get_text(strip=True)`
Get attribute	`element["data-sku"]` or `element.get("href")`