Add Your Heading Text Here

Web Scraping Walmart Data using Python for Product Information (Name, Price & Rating)

Scrape Walmart

Table of Contents

Scraping Walmart can have many use cases. It is the leading retailer in the USA & has enormous public product data available.

When web scraping prices for Walmart products you can have a close look at any price change.

If the price is lower then you can prefer to buy it. You create a Walmart price scraper that can show trends in price change over a year or so.

In this article, we will learn how we can web scrape Walmart data & create a Walmart price scraper, we will be using Python for this tutorial.

But, Why Scrape Walmart with Python?

Python is a widely used & simple language with built-in mathematical functions. It is also flexible and easy to understand even if you are a beginner. The Python community is too big and it helps when you face any error while coding.

Many forums like StackOverflowGitHub, etc already have the answers to the errors that you might face while coding when you do Walmart scraping.

You can do countless things with Python but for the sake of this article we will be extracting product details from Walmart.

Let’s Begin Web Scraping Walmart with Python

To begin with, we will create a folder and install all the libraries we might need during the course of this tutorial.

For now, we will install two libraries

  1. Requests will help us to make an HTTP connection with Walmart.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.
				
					>> mkdir walmart
>> pip install requests
>> pip install beautifulsoup4
				
			

Inside this folder, you can create a python file where we will write our code. We will scrape this Walmart page. Our data of interest will be:

  1. Name
  2. Price
  3. Rating
  4. Product Details

First of all, we will find the locations of these elements in the HTML code by inspecting them.

We can see the name is stored under tag h1 with attribute itemprop. Now, let’s see where the price is stored.

Price is stored under span tag with attribute itemprop whose value is price.

The rating is stored under span tag with class rating-number.

Product detail is stored inside div tag with class dangerous-html.

Let’s start with making a normal GET request to the target webpage and see what happens.

				
					import requests
from bs4 import BeautifulSoup

target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

resp = requests.get(target_url).text

print(resp)
				
			

Oops!! We got a captcha.

Walmart loves throwing captchas when they think the request is coming from a script/crawler and not from a browser. To remove this barrier from our way we can simply send some metadata/headers which will make Walmart consider our request as a legit request from a legit browser.

Now, if you are new to web scraping then I would advise you to read more about headers and their importance in Python. For now, we will use seven different headers to bypass Walmart’s security wall.

  1. Accept
  2. Accept-Encoding
  3. Accept-Language
  4. User-Agent
  5. Referer
  6. Host
  7. Connection
				
					import requests
from bs4 import BeautifulSoup

ac="text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"
headers={"Referer":"https://www.google.com","Connection":"Keep-Alive","Accept-Language":"en-US,en;q=0.9","Accept-Encoding":"gzip, deflate, br","Accept":ac,"User-Agent":"Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13G36 Safari/601.1"}

resp = requests.get(target_url, headers=headers)
print(resp.text)
				
			

One thing you might have noticed is Walmart sends a 200 status code even when it returns a captcha. So, to tackle this problem you can use the if/else statement.

				
					if("Robot or human" in resp.text):
    print(True)
else:
    print(False)
				
			

If it is True then Walmart has thrown a captcha otherwise our request was successful. Nest step is to extract our data of interest.

Here we will use BS4 and will scrape every value step by step. We have already determined the exact location of each of these data elements.

				
					soup = BeautifulSoup(resp.text,'html.parser')

l=[]
obj={}

try:
    obj["price"] = soup.find("span",{"itemprop":"price"}).text.replace("Now ","")
except:
    obj["price"]=None
				
			

Here we have extracted the price and then replace “Now ” (garbage string) with an empty string. You can use try/except statements to catch any errors.

Now similarly, let’s get the name and the rating of the product. We will come to the product description later.

				
					try:
    obj["name"] = soup.find("h1",{"itemprop":"name"}).text
except:
    obj["name"]=None

try:
    obj["rating"] = soup.find("span",{"class":"rating-number"}).text.replace("(","").replace(")","")
except:
    obj["rating"]=None

l.append(obj)

print(l)
				
			

We got all the data except “product detail”. Once you scroll down to your HTML data returned by your python script you will nowhere find this dangerous-html class. 

The reason behind this is the framework used by the Walmart website. It uses the Nextjs framework which sends JSON data once the page has been rendered completely. So, when the socket connection was broken product description part of the website was not loaded.

But the solution to this problem is very easy and it can be scraped in just two steps. Every Nextjs-backed website has a script tag with id as __NEXT_DATA__.

This script will return all the JSON data that we need. Since this is done through Javascript we could not have scraped it with a simple HTTP GET request. So, first of all, you have to find it using BS4 and then load it using the JSON library.

				
					import json

nextTag = soup.find("script",{"id":"__NEXT_DATA__"})
jsonData = json.loads(nextTag.text)

print(jsonData)
				
			

This is a huge JSON data which might be a little intimidating. You can use tools like JSON viewer to figure out the exact location of your desired object.

				
					try:
    obj["detail"] = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']
except:
    obj["detail"]=None
				
			

We got all the data we were hunting for. By the way, this huge JSON data also contains the data we scraped earlier. You just have to figure out the exact object where it is stored. I leave that part to you.

If you want to learn more about headers, requests, and other libraries of Python then I would advise you to read this web scraping with Python tutorial.

Complete Code

				
					import requests
from bs4 import BeautifulSoup

ac="text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"

target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

headers={"Referer":"https://www.google.com","Connection":"Keep-Alive","Accept-Language":"en-US,en;q=0.9","Accept-Encoding":"gzip, deflate, br","Accept":ac,"User-Agent":"Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13G36 Safari/601.1"}

resp = requests.get(target_url)
# print(resp.text)
# if("Robot or human" in resp.text):
#     print(True)
# else:
#     print(False)

soup = BeautifulSoup(resp.text,'html.parser')
l=[]
obj={}

try:
    obj["price"] = soup.find("span",{"itemprop":"price"}).text.replace("Now ","")
except:
    obj["price"]=None

try:
    obj["name"] = soup.find("h1",{"itemprop":"name"}).text
except:
    obj["name"]=None

try:
    obj["rating"] = soup.find("span",{"class":"rating-number"}).text.replace("(","").replace(")","")
except:
    obj["rating"]=None

import json

nextTag = soup.find("script",{"id":"__NEXT_DATA__"})
jsonData = json.loads(nextTag.text)

Detail = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']

try:
    obj["detail"] = Detail
except:
    obj["detail"]=None

l.append(obj)
print(l)
				
			

Limitation

The above approach is fine until you want to scrape a few hundred pages. However, this approach will not work if you’re going to scrape millions of pages because Walmart will block your IP, ultimately blocking your data pipeline.

To avoid this situation you must use web scraping APIs like Scrapingdog. Scrapingdog will handle proxy rotation, headers, retries, and headless browsers for you so that you can focus on data collection.

Scraping Walmart with Scrapingdog

Scrapingdog provides Walmart Scraper API that can help you create a seamless data pipeline in no time. The first 1000 credits are absolutely FREE and you can sign up for the free trial from here.

You can test the API directly from the dashboard.

You can even copy the Python code from the right.

				
					import requests
  
api_key = "Your-API-key"
url = "https://api.scrapingdog.com/walmart/product"
  
params = {
    "api_key": api_key,
    "url": "http://www.walmart.com/ip/46480251"
}
  
response = requests.get(url, params=params)
  
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code: {response.status_code}")
				
			

Remember to use your own API key in the above Python code. Once you run this code you will get beautiful parsed JSON data with all the valuable data from the Walmart product page.

 

 

I have also created a video tutorial on scraping Walmart at scale with Scrapingdog. Do check it out

Conclusion

As we know Python is too great when it comes to web scraping. We just used two basic libraries to scrape Walmart product details and print results in JSON. But this process has certain limits and as we discussed above, Walmart will block you if you do not rotate proxies or change headers timely.

If you want to scrape thousands and millions of pages then using Scrapingdog will be the best approach.

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on social media.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

web scraping for lead generation

Web Scraping For Lead Generation in 2025

Web scraping with many use cases have one in generating leads. We have explained here, how you can use it to generate leads.
best rank tracking apis

6 Best Rank Tracking APIs for Accurate Keyword Tracking

If you want to build a rank tracking mechanism, you will surely need a rank tracking API. In this read, we have taken the best APIs you can use to build a rank tracking system.