🚀 New: ChatGPT Scraper API — Run prompts & get structured JSON responses

Add Your Heading Text Here

How to Scrape Walmart Product Data with Python (2026)

Scrape Walmart

Table of Contents

TL;DR

  • Python guide to collect Walmart product name, price, rating, description; bypass CAPTCHAs with browser-like headers and detect “Robot or human.”
  • Parse with BeautifulSoup; pull description from Next.js __NEXT_DATA__ JSON.
  • Full script included; fine for small runs—expect blocks at scale.
  • For production, use Scrapingdog’s Walmart scraper (proxies / headers / retries / headless handled); 1,000 free credits and a dashboard tester.

Walmart is the largest retailer in the US with over 400 million product listings, making it one of the richest sources of public e-commerce data on the internet.

Businesses scrape Walmart to monitor competitor pricing in real time, track inventory changes, analyze customer reviews at scale, and build price comparison tools. Whether you’re doing market research or building a data pipeline, Python gives you the fastest path to structured Walmart data.

In this guide, you’ll learn how to scrape Walmart product pages using Python and later with Scrapingdog’s Walmart Scraper API.

Why Scrape Walmart? Key Use Cases

Before we write a single line of code, it’s worth understanding what you can actually do with Walmart data:

  • Price monitoring — Track price changes on specific products over time and build alerts when prices drop
  • Competitor analysis — If you sell on Walmart Marketplace, scraping competitor listings helps you adjust pricing and positioning
  • Market research — Analyze which products are trending, which have the most reviews, and how ratings shift
  • Price comparison tools — Build apps that compare Walmart prices against Amazon or eBay in real time
  • Inventory tracking — Monitor stock availability for high-demand products

How to Scrape Walmart Product Data with Python

To begin with, we will create a folder and install all the libraries we might need during the course of this tutorial.

For now, we will install two libraries

  1. Requests will help us to make an HTTP connection with Walmart.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.
				
					>> mkdir walmart
>> pip install requests beautifulsoup4 pandas
				
			

Inside this folder, you can create a python file where we will write our code. We will scrape this Walmart page. Our data of interest will be:

  1. Name
  2. Price
  3. Rating
  4. Product Details

First of all, we will find the locations of these elements in the HTML code by inspecting them.

As you can see, the product name is stored in an h1 tag with itemprop="name". Now, let’s inspect where Walmart stores the price.

The price is stored inside a span tag with itemprop="price". Now let’s find where Walmart stores the rating.

The rating is inside a span tag with the class rating-number. Finally, let’s locate the product description.

The product description sits inside a div tag with the class dangerous-html.

Now that we’ve mapped out all the elements we need, let’s start by making a basic GET request to the product page and see how Walmart responds.

We’ll start with just the requests library, no headers, no extras, just to see what Walmart returns to a raw, unauthenticated request.

				
					import requests
from bs4 import BeautifulSoup

target_url = "https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

response = requests.get(target_url)
print(response.text)
				
			

As expected, Walmart returns a CAPTCHA instead of product data. This happens because Walmart’s servers inspect incoming requests and flag anything that doesn’t look like a real browser.

The fix is straightforward. We need to send HTTP headers that mimic a genuine browser request. The most important ones are:

  • User-Agent — identifies the browser and OS making the request
  • Accept — tells the server what content types the client can handle
  • Accept-Language — specifies the preferred language
  • Accept-Encoding — indicates supported compression formats
  • Referer — simulates arriving from another page (e.g. Google)
  • Host — the target domain
  • Connection — keeps the socket open between requests

Together, these headers make your request look indistinguishable from one coming out of a real browser.

				
					import requests
from bs4 import BeautifulSoup

target_url = "https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"

headers = {
    "Accept": accept,
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "Referer": "https://www.google.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

response = requests.get(target_url, headers=headers)
print(response.text)
				
			

Walmart returns a 200 OK status code even when it serves a CAPTCHA page. You can’t rely on the status code alone to know if your request succeeded. Instead, check the response body for a telltale string from the CAPTCHA page.

				
					if "Robot or human" in response.text:
    print("Blocked — CAPTCHA detected")
else:
    print("Success — page loaded correctly")
				
			

If the check prints ‘Blocked’, you’ll need to rotate your headers or try a different User-Agent. If it prints ‘Success’, you’re clear to proceed with extraction.

Now let’s parse the response with BeautifulSoup and extract each data point we mapped earlier.

				
					soup = BeautifulSoup(response.text, "html.parser")

results = []
product = {}

try:
    product["price"] = soup.find("span", {"itemprop": "price"}).text.replace("Now ", "").strip()
except AttributeError:
    product["price"] = None
				
			

We strip out the 'Now ' prefix that Walmart sometimes prepends to prices, leaving just the numeric value. The try/except block handles cases where the element isn’t found on the page. For example, if a product is out of stock and the price isn’t rendered.

Next, let’s extract the product name and rating the same way.

				
					try:
    product["name"] = soup.find("h1", {"itemprop": "name"}).text.strip()
except AttributeError:
    product["name"] = None

try:
    product["rating"] = soup.find("span", {"class": "rating-number"}).text.strip("()")
except AttributeError:
    product["rating"] = None

results.append(product)
print(results)
				
			

Notice that we have everything except the product description. If you search through the raw HTML returned by your script, you won’t find the dangerous-html class anywhere.

This is because Walmart is built on Next.js, which embeds its data as a JSON blob inside a <script> tag with id="__NEXT_DATA__" rather than rendering it directly in the HTML. Since this data is injected client-side after the page loads, a basic HTTP request captures it before JavaScript has a chance to run, but the JSON blob itself is still present in the raw HTML, which means we can extract it without a headless browser.

Every Next.js site follows this same pattern, so this technique works broadly beyond just Walmart.

The __NEXT_DATA__ script tag contains all the product data we need in a single structured JSON object — including fields we haven’t even explicitly scraped yet like brand, availability, and images.

				
					import json

next_data = soup.find("script", {"id": "__NEXT_DATA__"})
json_data = json.loads(next_data.string)

print(json_data)
				
			

This is a huge JSON data which might be a little intimidating. You can use tools like JSON viewer to figure out the exact location of your desired object.

				
					try:
    obj["detail"] = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']
except:
    obj["detail"]=None
				
			

At this point we have all four data points — name, price, rating, and description, all successfully extracted.

One thing worth noting: the __NEXT_DATA__ JSON blob actually contains all the product data we scraped earlier via BeautifulSoup as well. If you explore the JSON structure using a tool like JSONViewer, you’ll find price, name, and rating all nested inside it, meaning you could skip the BeautifulSoup selectors entirely and pull everything from the JSON in one shot.

For a deeper dive into headers, sessions, and other Python scraping fundamentals, check out our Web Scraping with Python guide.

Complete Code

				
					import json
import requests
from bs4 import BeautifulSoup

TARGET_URL = "https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"

headers = {
    "Accept": accept,
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "Referer": "https://www.google.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

response = requests.get(TARGET_URL, headers=headers)

if "Robot or human" in response.text:
    raise SystemExit("Blocked by Walmart — try rotating your User-Agent or headers.")

soup = BeautifulSoup(response.text, "html.parser")

product = {}

try:
    product["name"] = soup.find("h1", {"itemprop": "name"}).text.strip()
except AttributeError:
    product["name"] = None

try:
    product["price"] = soup.find("span", {"itemprop": "price"}).text.replace("Now ", "").strip()
except AttributeError:
    product["price"] = None

try:
    product["rating"] = soup.find("span", {"class": "rating-number"}).text.strip("()")
except AttributeError:
    product["rating"] = None

try:
    next_data = soup.find("script", {"id": "__NEXT_DATA__"})
    json_data = json.loads(next_data.string)
    product["description"] = json_data["props"]["pageProps"]["initialData"]["data"]["product"]["shortDescription"]
except (AttributeError, KeyError):
    product["description"] = None

print(product)
				
			

Limitation

The approach above works well for small-scale scraping. A few dozen or hundred pages. But once you start hitting Walmart at any meaningful volume, you’ll run into IP bans, rate limiting, and increasingly aggressive CAPTCHA challenges that headers alone won’t solve.

At scale, you need proxy rotation, automatic header cycling, and retry logic built into your pipeline. Building and maintaining that infrastructure yourself is time-consuming and brittle. Walmart actively updates its bot detection, so what works today may break tomorrow.

That’s where Scrapingdog’s Walmart Scraper API comes in. It handles proxy rotation, header management, retries, and JavaScript rendering automatically, so you can focus on the data rather than the infrastructure.

Scraping Walmart with Scrapingdog

Scrapingdog’s Walmart Scraper API returns clean, structured JSON from any Walmart product page. No proxy management, no header tuning, no CAPTCHA handling on your end.

Once you sign up, you can test any Walmart URL directly from the dashboard before writing a single line of code.

The dashboard also generates ready-to-use Python code for any request, which you can copy directly into your project.

				
					import requests

api_key = "Your-API-key"
url = "https://api.scrapingdog.com/walmart/product"
params = {
    "api_key": api_key,
    "url": "http://www.walmart.com/ip/46480251"
}
response = requests.get(url, params=params)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code: {response.status_code}")
				
			

Remember to use your own API key in the above Python code. Once you run this code you will get beautiful parsed JSON data with all the valuable data from the Walmart product page.

I have also created a video tutorial on scraping Walmart at scale with Scrapingdog. Do check it out.

Here are Some Key Takeaways:

  • Walmart product pages contain valuable data like price, ratings, availability, and reviews.
  • Direct scraping often triggers anti-bot protections like CAPTCHA and IP blocks.
  • Python (requests + BeautifulSoup) can extract basic product details.
  • Walmart pages include dynamic and embedded JSON data that requires careful parsing.
  • Using a scraper API simplifies scaling, proxy handling, and structured data extraction.

Conclusion

As we know Python is too great when it comes to web scraping. We just used two basic libraries to scrape Walmart product details and print results in JSON. But this process has certain limits and as we discussed above, Walmart will block you if you do not rotate proxies or change headers timely.

If you want to scrape thousands and millions of pages then using Scrapingdog will be the best approach.

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on social media.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Manthan Koolwal
Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

Scrape Google Maps using Python

How to Scrape Google Maps Data using Python (2026 Guide)

Google Maps are mostly scraped to extract potential lead's credentials. In this blog, we have used Python to scrape phone numbers and other details from Google Maps Listing.
Scraping Google Images using Python

How to Scrape Google Images with Python (2026 Guide)

We have scraped google images using Python in this blog.