How to Scrape Data from Google Maps using Python

In today’s digital age, online reviews have become an integral part of our decision-making process. Whether we’re searching for a cozy restaurant, a reputable doctor, or a five-star hotel, we often turn to platforms like Google Maps to read user reviews and gauge the quality of services.

For businesses, these reviews are not just feedback but a vital aspect of their online presence. So, what if you could harness the power of Python to extract and analyze these valuable insights from Google Maps? In this article, we’ll explore how to scrape Google Maps reviews using Python, opening up a world of possibilities for businesses, researchers, and data enthusiasts alike.

Scraping Google Maps reviews can offer a wealth of information. You can uncover trends, sentiments, and preferences of customers, providing businesses with actionable insights to enhance their services.

Whether you’re looking to gather competitive intelligence, track your own business’s performance, or conduct market research, Python offers a versatile toolkit to automate the extraction of Google Maps reviews efficiently. Join us on this journey as we delve into the fascinating world of web scraping, data extraction, and analysis to unlock the hidden treasures of Google Maps reviews.

Web scraping Google Maps reviews can be achieved by using Playwright and Beautiful Soup Python libraries. The first is an emerging headless browser and the second is a widely recognized web scraping library that offers extensive documentation.

Playwright and Beautiful Soup: Why choose this team?

Playwright is a library developed by Microsoft, initially intended for JavaScript applications. but it has since been extended to support Python, serving as a good alternative to Selenium when it comes to headless browser automation.

Playwright allows you to control browser behavior testing, web scraping, and other automation tasks. To install Playwright in your virtual environment, you’ll need to run the following commands.

1pip install pytest-playwright
2playwright install

It can easily be paired with web scraping libraries, such as Beautiful Soup. Which is a well-known library that parses data from HTML and XML files. To install it, you can run the following pip command.

1pip install beautifulsoup4

How to automate Google?

To scrape reviews from Google Maps, a set of automation tasks need to be taken beforehand, such as clicks, scrolls, and changing pages. Take a look at the required imports.

1import time
2from playwright.sync_api import sync_playwright
3from bs4 import BeautifulSoup

Now let’s specify three variables, one for the category for which we want reviews, another for the location, and finally the Google main URL.

1# the category for which we seek reviews
2CATEGORY = "vegan restaurants"
3# the location
4LOCATION = "Lisbon, Portugal"
5# google's main URL
6URL = "https://www.google.com/"

All set to start our Playwright instance.

1with sync_playwright() as pw:
2# creates an instance of the Chromium browser and launches it
3      browser = pw.chromium.launch(headless=False)
4      # creates a new browser page (tab) within the browser instance
5      page = browser.new_page()

Playwright supports both synchronous and asynchronous variations, in this case, we are using synchronous for the sake of better understanding each step, since in this mode, each command is executed one after the other.

In addition, Playwright is compatible with all modern rendering engines: Chromium, Webkit and Firefox. In this case, we’ll be using Chromium which is the most used.

A new instance of the latter browser is created with the headless mode set to False, allowing the user to see the automation live on a GUI (Graphic User Interface). Finally, the new browser page is created, this instance will be responsible for most of the actions.

1# go to url with Playwright page element
2page.goto(URL)
3# deal with cookies
4page.click('.QS5gu.sy4vM')
5# write what you're looking for
6page.fill("textarea", f"{CATEGORY} near {LOCATION}")
7# press enter
8page.keyboard.press('Enter')
9# change to english
10page.locator("text='Change to English'").click()
11time.sleep(4)
12# click in the "Maps" HTML element
13page.click('.GKS7s')
14time.sleep(4)

Above, we can see several automation actions applied with the page instance. The first task (page.goto(URL)), moves the browser’s tab to the Google main URL. Then, in some cases, Google might display a cookies window, depending on your location or proxy.

In that case, you can use the function .click() on the HTML class (‘__.QS5gu.sy4vM’) which owns the button to continue.

How to Scrape Data from Google Maps using Python

At this point, we have reached Google’s main page, and we can write what are we looking for. The variables CATEGORY and LOCATION were introduced before, and they can be used in the .fill() function. Writing is not enough, and that’s why just below we see the .keyboard.press() function to press Enter_._

If you’re running the script from a non-English country without a proxy, and you want the reviews in English, you might need to click on some HTML element that changes the language. In this case, this was achieved by using the .locator() function to track the text Change to English and click on it.

The .sleep() functions are important to add loading time just after the actions. Sometimes they take more time than expected and the following steps do not occur, resulting in error.

Finally, we can head to the Google Maps page, by clicking on the respective HTML class (‘.GKS7s’).

The Google Maps page shows the different vegan restaurants in Lisbon. But only a few are presented. To see more we need to start scrolling, and it is an infinite scroll situation, meaning that not all restaurants are loaded at the same time.

1# scrolling
2for i in range(4):
3      # tackle the body element
4      html = page.inner_html('body')
5      # create beautiful soup element
6      soup = BeautifulSoup(html, 'html.parser')
7      # select items
8      categories = soup.select('.hfpxzc')
9      last_category_in_page = categories[-1].get('aria-label')
10      # scroll to the last item
11      last_category_location = page.locator(
12          f"text={last_category_in_page}")
13      last_category_location.scroll_into_view_if_needed()
14# get links of all categories after scroll
15links = [item.get('href') for item in soup.select('.hfpxzc')]

The code snippet shows a loop to scroll the page 4 times. The higher the number, the more restaurants we have.

This is where we start using Beautiful Soup, not to scrape reviews just yet, but to grab a string that is needed to apply scrolling. The html instance contains the HTML information, and the soup element is created to be able to parse it.

Playwright owns other functions to do scrolling such as .mouse.wheel(), but in this case, we have it on the left and another strategy had to be applied by using the function .scroll_into_view_if_needed(). This takes a locator element and scrolls to it. In this case, the element is the last restaurant title available on the page. This triggers the loading of more restaurants. The step is repeated until the desired number of restaurants is reached.

At the very end of the loop, we can obtain all the restaurant URLs (links), by selecting the same HTML element as before (‘.hfpxzc’) and getting the href of each.

See the code below.

1for link in links:
2# go to subject link
3      page.goto(link)
4      time.sleep(4)
5      # load all reviews
6      page.locator("text='Reviews'").first.click()
7      time.sleep(4)
8      # create new soup
9      html = page.inner_html('body')
10      # create beautiful soup element
11      soup = BeautifulSoup(html, 'html.parser')
12      # scrape reviews
13      reviews = soup.select('.MyEned')
14      reviews = [review.find('span').text for review in reviews]
15      # print reviews
16      for review in reviews:
17      print(review)
18            print('n')

Another loop is needed to extract the reviews from each restaurant. This time we navigate to each link. Then we locate the ‘Reviews’ tab and click on it. We need to make another soup instance, otherwise, we would be reading the HTML information from the previous page.

The first reviews of each restaurant are presented in the ‘.MyEned ‘ class. From here we take the text of all span elements (reviews).

See the output below:

1As a tourist I really recommend this place a super nice family business with delicious vegan food and a mix of different 
2cultures as well. We had the Brazilian dish (Feijoada) and the mushrooms 🍄 calzone with salad as well and the apple 🍏 ...
3A perk in Lisboa, where is a bit hard to find vegan food. This restaurant is managed by very lovely people, the owner is so 
4kind and her wife too. Quality is top of the edge, they do not use much spices neither much salt or sugar, but yet ...
5Good solid vegan food. Not inventive just very good.  Very nice out of the way location.

Complete Code

Of course, you can scrape more valuable data from the page but for the current scenario, the code will look like this.

1import time
2from playwright.sync_api import sync_playwright
3from bs4 import BeautifulSoup
4from rich import print
5# the category for which we seek reviews
6CATEGORY = "vegan restaurants"
7# the location
8LOCATION = "Lisbon, Portugal"
9# google's main URL
10URL = "https://www.google.com/"
11if __name__ == '__main__':
12    with sync_playwright() as pw:
13        # creates an instance of the Chromium browser and launches it
14        browser = pw.chromium.launch(headless=False)
15        # creates a new browser page (tab) within the browser instance
16        page = browser.new_page()
17        # go to url with Playwright page element
18        page.goto(URL)
19        # deal with cookies page
20        page.click('.QS5gu.sy4vM')
21        # write what you're looking for
22        page.fill("textarea", f"{CATEGORY} near {LOCATION}")
23        # press enter
24        page.keyboard.press('Enter')
25        # change to english
26        page.locator("text='Change to English'").click()
27        time.sleep(4)
28        # click in the "Maps" HTML element
29        page.click('.GKS7s')
30        time.sleep(4)
31        # scrolling
32        for i in range(2):
33            # tackle the body element
34            html = page.inner_html('body')
35            # create beautiful soup element
36            soup = BeautifulSoup(html, 'html.parser')
37            # select items
38            categories = soup.select('.hfpxzc')
39            last_category_in_page = categories[-1].get('aria-label')
40            # scroll to the last item
41            last_category_location = page.locator(
42                f"text={last_category_in_page}")
43            last_category_location.scroll_into_view_if_needed()
44            # wait to load contents
45            time.sleep(4)
46        # get links of all categories after scroll
47        links = [item.get('href') for item in soup.select('.hfpxzc')]
48        for link in links:
49            # go to subject link
50            page.goto(link)
51            time.sleep(4)
52            # load all reviews
53            page.locator("text='Reviews'").first.click()
54            time.sleep(4)
55            # create new soup
56            html = page.inner_html('body')
57            # create beautiful soup element
58            soup = BeautifulSoup(html, 'html.parser')
59            # scrape reviews
60            reviews = soup.select('.MyEned')
61            reviews = [review.find('span').text for review in reviews]
62            # print reviews
63            for review in reviews:
64                print(review)
65                print('n')

Once you run the code it will look like this on your screen.

Conclusion

In the age of information, data is power, and Python equips us with the tools to access that power. With the knowledge you’ve gained from this article, you’re now equipped to scrape Google Maps reviews with ease, transforming raw data into actionable insights. Whether you’re a business owner aiming to monitor your online reputation, a researcher seeking to analyze customer sentiments, or simply a Python enthusiast looking for a practical project, the ability to extract and analyze Google Maps reviews is a valuable skill.

You can also use Selenium but to be honest, I was getting bored with Selenium (of course it’s a great library). Playwright brings flexibility and consumes much fewer resources than selenium does.

I hope you like this tutorial and if you do then please do not forget to share it with your friends and on your social media.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked Try for Free Contact sales My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.

Manthan Koolwal