Understanding customer sentiment through product reviews has become a critical factor for success. Among various platforms, Amazon stands out with its extensive range of products and hence reviews accompanying each listing.
Web scraping Amazon reviews can give valuable feedback, enabling detailed sentiment analysis of products within the same category. By extracting and analyzing these insights, businesses, and developers can mold their offerings to better meet consumer needs and preferences.
In this blog, we will scrape reviews from one of the products. We will collect details like reviews posted by the user, his rating, his user name, etc.
Limitations of scraping Amazon reviews simply with Python
Scraping Amazon reviews simply with Python will lead you to the login screen.
To overcome this we have to use an Amazon reviews scraping API. Scrapingdog can help you scrape millions of such pages without getting blocked. It will handle IP rotations and retries for you so that you can focus on data collection.
Let’s see how you can scrape reviews from Amazon using Scrapingdog with ease.
Requirements
Recently Amazon changed something at their end and due to that you can no longer scrape Amazon product reviews until you are logged in. To bypass that login wall we are going to use Amazon Review Scraper API offered by Scrapingdog to scrape reviews of any product at scale. You can sign up for the trial. The trial will provide you with 1000 free credits which are enough for testing the service.
Other than that you need Python 3.x
. If it is not installed already on your computer then you can do that from here.
Now, create a folder by any name you like, and then install the requests
library using pip
.
mkdir amazon-reviews
pip install requests
Create a Python file by any name. I am naming the file as reviews.py
. We will write our script in this file.
Scraping Amazon Reviews
For this tutorial, we are going to scrape reviews of this product. You can find all the reviews of this product here. If you read the documentation of Amazon review scraper API you will find that you can either pass this complete URL to the API or you can pass the asin code of the product along with the Amazon domain.
Let’s first take a look at the dashboard.
If you put the asin code of the product B0CTKXMQXK and .com as domain in the form then you will get this data.
You got a beautiful JSON response with details like title, user name, rating, etc. Now, let’s implement the same thing in the Python code with the review URL.
You can copy the Python code directly from the dashboard itself.
import requests
api_key = "Your-api-key"
url = "https://api.scrapingdog.com/amazon/reviews"
params = {
"api_key": api_key,
"url": "https://www.amazon.com/Echo-Charlie-Slim-Minimalist-Wallet/product-reviews/B0CTKXMQXK/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Request failed with status code: {response.status_code}")
Do not forget to add your own API key in the above code. This code makes a GET request to the /reviews
endpoint of Scrapingdog and extracts JSON data from it. If the status code of the response is 200
then it prints the reviews and if it is not then it prints a custom error message. Let’s run the code.
I got the JSON response with the reviews on the first page. You can run a for
loop in the above code to change the page numbers in the URL to extract all the reviews of any product with the help of API.
Here’s a video tutorial on how you can use Amazon Review API from Scrapingdog.
Storing data to CSV
Pandas
Library. You can download this library inside your folder.
pip install pandas
Once done you can use this library and run a for loop inside the customer_reviews
array to reach each and every individual array.
import requests
import pandas as pd
obj={}
l=[]
api_key = "your-api-key"
url = "https://api.scrapingdog.com/amazon/reviews"
params = {
"api_key": api_key,
"asin": "B0CTKXMQXK",
"domain": "com",
"page": "1"
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
# print(data)
for i in range(0,len(data['customer_reviews'])):
obj['Name']=data['customer_reviews'][i]['user']
obj['Title']=data['customer_reviews'][i]['title']
obj['Verified']=data['customer_reviews'][i]['extension']
obj['Rating']=data['customer_reviews'][i]['rating']
obj['Review']=data['customer_reviews'][i]['review']
l.append(obj)
obj={}
df = pd.DataFrame(l)
df.to_csv('reviews.csv', index=False, encoding='utf-8')
else:
print(f"Request failed with status code: {response.status_code}")
Once you run this code a file by the name reviews.csv
will be created inside your folder.
Conclusion
We saw how using Python and Amazon reviews scraper API you can easily scrape reviews of any product at scale. Scraping Amazon reviews without a scraper is not possible anymore. So, if you are scraping reviews for commercial purposes at scale then using a scraper API is a must.
If you like this article then do not forget to share this article with your friends and followers on social media.