Add Your Heading Text Here

How To Parse HTML with PHP

parse html with php

Table of Contents

Parsing is the most critical task after scraping. Whether you’re building a web crawler, scraping data, or just extracting elements from a page, PHP offers some great tools for HTML parsing.

In this detailed guide, we’ll explore everything you need to know about parsing HTML with PHP, from the basics to advanced examples.

Parsing Methods in PHP

Before we go deeper, let’s outline the primary ways HTML can be parsed using PHP:

  1. DOMDocument (Built-in)
  2. Simple HTML DOM Parser (External Library)
  3. Goutte (Symfony-based Web Scraper)
  4. cURL + Regex (Not recommended, but used)

Installing PHP and Required Libraries

Before you begin scraping or parsing, ensure you have PHP installed on your system.

Install PHP

If you’re on macOS:

				
					brew install php
				
			

For Ubuntu/Debian:

				
					sudo apt update
sudo apt install php php-cli php-curl php-mbstring
				
			

For Windows:

  • Download PHP from php.net.
  • Extract and add the path to your system’s environment variables.

 

Install Composer (PHP Dependency Manager)

				
					php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php composer-setup.php
php -r "unlink('composer-setup.php');"
				
			

Then, move the file:

				
					mv composer.phar /usr/local/bin/composer
				
			

Install paquettg/php-html-parser

This is one of the most popular PHP HTML parsing libraries.

								
				
					composer require paquettg/php-html-parser
				
			

Forget about getting blocked while scraping the Web

Try out Scrapingdog Web Scraping API & scrape any website at Scale. We handle all the proxies, headless browsers & retries for you!

Scraping with PHP

For this tutorial, we are going to this site to scrape and parse. Now, create a PHP file by any name you like. I am naming the file as scraper.php.

Read More: A Complete Guide on Web Scraping with PHP

				
					<?php
$url = 'https://www.scrapethissite.com/pages/forms/?per_page=100';
$html = file_get_contents($url);

if ($html === false) {
    die("Failed to fetch page");
}

file_put_contents('raw.html', $html);
echo "HTML content saved successfully.\n";
?>
				
			

The code is very simple, but let me explain you step by step.

  • We defined a variable $url containing the URL of the web page you want to scrape.
  • Then we are using PHP’s built-in file_get_contents() function to send a GET request to the URL.
  • The HTML content of the page is then stored in the $html variable.
  • It’s a simple way to fetch raw HTML from a web page.
  • This checks if the request failed (i.e., $html is false).
  • If the page couldn’t be loaded, the script stops and prints:
    “Failed to fetch page”
  • This writes the fetched HTML content into a new file called raw.html.
  • The file will be created in the same directory as the script (or overwritten if it exists).
  • Finally, a success message is printed to confirm that the file has been saved.

Now, let’s parse it.

Parsing with PHP

Now, let’s parse the raw HTML and extract the team nameyearwins, and losses.

				
					<?php
require 'vendor/autoload.php';
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromFile('raw.html');

$data = [];
$rows = $dom->find('.table tbody tr');

foreach ($rows as $row) {
    $teamName = $row->find('td', 0)->text;
    $year = $row->find('td', 1)->text;
    $wins = $row->find('td', 2)->text;
    $losses = $row->find('td', 3)->text;

    $data[] = [
        'Team' => trim($teamName),
        'Year' => trim($year),
        'Wins' => trim($wins),
        'Losses' => trim($losses)
    ];
}

print_r($data);
?>
				
			
  • First, we load all Composer-installed PHP libraries.
  • Assumes you’ve installed paquettg/php-html-parser via Composer.
  • Imports the Dom class from the library, allowing you to parse and interact with HTML DOM elements.
  • Creates a new DOM parser instance.
  • Loads the HTML from raw.html (an offline copy of a webpage with a table) for processing.
  • Initializes an empty array called $data to hold the parsed results.
  • Uses a CSS selector to find all <tr> (table row) elements inside <tbody> of a table with class .table.
  • Iterates through each row of the table.
  • Adds a new entry to the $data array.
  • trim() removes any leading/trailing whitespace from the extracted text.
  • Outputs the final structured array in a human-readable format.

Once you run this code, you will get a beautiful parsed response.

				
					Array
(
    [0] => Array
        (
            [Team] => Boston Celtics
            [Year] => 2013
            [Wins] => 41
            [Losses] => 40
        )

    [1] => Array
        (
            [Team] => Brooklyn Nets
            [Year] => 2013
            [Wins] => 49
            [Losses] => 33
        )

    [2] => Array
        (
            [Team] => New York Knicks
            [Year] => 2013
            [Wins] => 37
            [Losses] => 45
        )

    [3] => Array
        (
            [Team] => Philadelphia 76ers
            [Year] => 2013
            [Wins] => 19
            [Losses] => 63
        )

    [4] => Array
        (
            [Team] => Toronto Raptors
            [Year] => 2013
            [Wins] => 48
            [Losses] => 34
        )
)
				
			

Storing the Data in a CSV File

Let’s export this parsed data into a CSV file.

				
					<?php
$csvFile = fopen('teams.csv', 'w');

// Headers
fputcsv($csvFile, ['Team', 'Year', 'Wins', 'Losses']);

// Data
foreach ($data as $line) {
    fputcsv($csvFile, $line);
}

fclose($csvFile);
echo "Data written to teams.csv successfully.\n";
?>

				
			
  • Opens a new file named teams.csv in write mode.
  • If the file doesn’t exist, it will be created.
  • $csvFile is now a file handle used for writing to the file.
  • Write the column headers (the first row) into the CSV file.
  • fputcsv() automatically formats the array into a comma-separated line
  • Then, we iterate through the $data array, which is assumed to be an array of associative arrays.
  • Finally, we close the file after writing is complete.

 

Complete Code

				
					<?php
// Step 1: Fetch HTML from target URL
$url = 'https://www.scrapethissite.com/pages/forms/?per_page=100';
$html = file_get_contents($url);

if ($html === false) {
    die("Failed to fetch the page.");
}

// Save the raw HTML locally (optional)
file_put_contents('raw.html', $html);
echo "HTML content saved to raw.html\n";

// Step 2: Load and parse HTML using PHPHtmlParser
require 'vendor/autoload.php';
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromFile('raw.html');

// Step 3: Extract table rows
$data = [];
$rows = $dom->find('.table tbody tr');

foreach ($rows as $row) {
    $teamName = $row->find('td', 0)->text;
    $year     = $row->find('td', 1)->text;
    $wins     = $row->find('td', 2)->text;
    $losses   = $row->find('td', 3)->text;

    $data[] = [
        'Team'   => trim($teamName),
        'Year'   => trim($year),
        'Wins'   => trim($wins),
        'Losses' => trim($losses)
    ];
}

// Step 4: Write the data to a CSV file
$csvFile = fopen('teams.csv', 'w');

// Add header row
fputcsv($csvFile, ['Team', 'Year', 'Wins', 'Losses']);

// Write each data row
foreach ($data as $line) {
    fputcsv($csvFile, $line);
}

fclose($csvFile);
echo "Data written to teams.csv successfully.\n";
?>
				
			

Conclusion

You just learned how to scrape and parse data from a real-world HTML page using PHP. From installing dependencies to writing your first HTML parsing logic and exporting the results, this guide covers the end-to-end workflow.

Whether you’re scraping for personal projects or commercial tools, PHP offers powerful solutions when used with the right libraries.

Additional Resources

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

parse html with php

How To Parse HTML with PHP

In this blog, we have used PHP to parse the HTML, we have mentioned several libraries in this blog that you can use to parse.
best google scholar apis

3 Best Google Scholar APIs to Checkout in 2025

In this blog, we have identified the best Google scholar APIs that can be used. We have identified them on certain criteria including pricing, reliability & scalability.