< Back to Blog Overview

How to Scrape Data from Google Maps using Python

19-09-2023

In today’s digital age, online reviews have become an integral part of our decision-making process. Whether we’re searching for a cozy restaurant, a reputable doctor, or a five-star hotel, we often turn to platforms like Google Maps to read user reviews and gauge the quality of services.

For businesses, these reviews are not just feedback but a vital aspect of their online presence. So, what if you could harness the power of Python to extract and analyze these valuable insights from Google Maps? In this article, we’ll explore how to scrape Google Maps reviews using Python, opening up a world of possibilities for businesses, researchers, and data enthusiasts alike.

scrape google maps reviews using python
Scraping Google Maps Review using Python

Scraping Google Maps reviews can offer a wealth of information. You can uncover trends, sentiments, and preferences of customers, providing businesses with actionable insights to enhance their services.

Whether you’re looking to gather competitive intelligence, track your own business’s performance, or conduct market research, Python offers a versatile toolkit to automate the extraction of Google Maps reviews efficiently. Join us on this journey as we delve into the fascinating world of web scraping, data extraction, and analysis to unlock the hidden treasures of Google Maps reviews.

Web scraping Google Maps reviews can be achieved by using Playwright and Beautiful Soup Python libraries. The first is an emerging headless browser and the second is a widely recognized web scraping library that offers extensive documentation.

Playwright and Beautiful Soup: Why choose this team?

Playwright is a library developed by Microsoft, initially intended for JavaScript applications. but it has since been extended to support Python, serving as a good alternative to Selenium when it comes to headless browser automation.

Playwright allows you to control browser behavior testing, web scraping, and other automation tasks. To install Playwright in your virtual environment, you’ll need to run the following commands.

pip install pytest-playwright
playwright install

It can easily be paired with web scraping libraries, such as Beautiful Soup. Which is a well-known library that parses data from HTML and XML files. To install it, you can run the following pip command.

pip install beautifulsoup4

How to automate Google?

To scrape reviews from Google Maps, a set of automation tasks need to be taken beforehand, such as clicks, scrolls, and changing pages. Take a look at the required imports.

import time
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup

Now let’s specify three variables, one for the category for which we want reviews, another for the location, and finally the Google main URL.

# the category for which we seek reviews
CATEGORY = "vegan restaurants"


# the location
LOCATION = "Lisbon, Portugal"


# google's main URL
URL = "https://www.google.com/"

All set to start our Playwright instance.

with sync_playwright() as pw:
# creates an instance of the Chromium browser and launches it
      browser = pw.chromium.launch(headless=False)


      # creates a new browser page (tab) within the browser instance
      page = browser.new_page()

Playwright supports both synchronous and asynchronous variations, in this case, we are using synchronous for the sake of better understanding each step, since in this mode, each command is executed one after the other.

In addition, Playwright is compatible with all modern rendering engines: Chromium, Webkit and Firefox. In this case, we’ll be using Chromium which is the most used.

A new instance of the latter browser is created with the headless mode set to False, allowing the user to see the automation live on a GUI (Graphic User Interface). Finally, the new browser page is created, this instance will be responsible for most of the actions.

# go to url with Playwright page element
page.goto(URL)


# deal with cookies
page.click('.QS5gu.sy4vM')


# write what you're looking for
page.fill("textarea", f"{CATEGORY} near {LOCATION}")


# press enter
page.keyboard.press('Enter')


# change to english
page.locator("text='Change to English'").click()
time.sleep(4)


# click in the "Maps" HTML element
page.click('.GKS7s')
time.sleep(4)

Above, we can see several automation actions applied with the page instance. The first task (page.goto(URL)), moves the browser’s tab to the Google main URL. Then, in some cases, Google might display a cookies window, depending on your location or proxy.

In that case, you can use the function .click() on the HTML class (‘.QS5gu.sy4vM’) which owns the button to continue.

At this point, we have reached Google’s main page, and we can write what are we looking for. The variables CATEGORY and LOCATION were introduced before, and they can be used in the .fill() function. Writing is not enough, and that’s why just below we see the .keyboard.press() function to press Enter.

If you’re running the script from a non-English country without a proxy, and you want the reviews in English, you might need to click on some HTML element that changes the language. In this case, this was achieved by using the .locator() function to track the text Change to English and click on it.

The .sleep() functions are important to add loading time just after the actions. Sometimes they take more time than expected and the following steps do not occur, resulting in error.

Finally, we can head to the Google Maps page, by clicking on the respective HTML class (‘.GKS7s’).

The Google Maps page shows the different vegan restaurants in Lisbon. But only a few are presented. To see more we need to start scrolling, and it is an infinite scroll situation, meaning that not all restaurants are loaded at the same time.

# scrolling
for i in range(4):
      # tackle the body element
      html = page.inner_html('body')


      # create beautiful soup element
      soup = BeautifulSoup(html, 'html.parser')


      # select items
      categories = soup.select('.hfpxzc')
      last_category_in_page = categories[-1].get('aria-label')


      # scroll to the last item
      last_category_location = page.locator(
          f"text={last_category_in_page}")
      last_category_location.scroll_into_view_if_needed()


# get links of all categories after scroll
links = [item.get('href') for item in soup.select('.hfpxzc')]

The code snippet shows a loop to scroll the page 4 times. The higher the number, the more restaurants we have.

This is where we start using Beautiful Soup, not to scrape reviews just yet, but to grab a string that is needed to apply scrolling. The html instance contains the HTML information, and the soup element is created to be able to parse it.

Playwright owns other functions to do scrolling such as .mouse.wheel(), but in this case, we have it on the left and another strategy had to be applied by using the function .scroll_into_view_if_needed()This takes a locator element and scrolls to it. In this case, the element is the last restaurant title available on the page. This triggers the loading of more restaurants. The step is repeated until the desired number of restaurants is reached.

At the very end of the loop, we can obtain all the restaurant URLs (links), by selecting the same HTML element as before (‘.hfpxzc’) and getting the href of each.

See the code below.

for link in links:
# go to subject link
      page.goto(link)
      time.sleep(4)


      # load all reviews
      page.locator("text='Reviews'").first.click()
      time.sleep(4)


      # create new soup
      html = page.inner_html('body')


      # create beautiful soup element
      soup = BeautifulSoup(html, 'html.parser')


      # scrape reviews
      reviews = soup.select('.MyEned')
      reviews = [review.find('span').text for review in reviews]


      # print reviews
      for review in reviews:
      print(review)
            print('\n')

Another loop is needed to extract the reviews from each restaurant. This time we navigate to each link. Then we locate the ‘Reviews’ tab and click on it. We need to make another soup instance, otherwise, we would be reading the HTML information from the previous page.

The first reviews of each restaurant are presented in the â€˜.MyEned â€˜ class. From here we take the text of all span elements (reviews).

See the output below:

As a tourist I really recommend this place a super nice family business with delicious vegan food and a mix of different 
cultures as well. We had the Brazilian dish (Feijoada) and the mushrooms 🍄 calzone with salad as well and the apple 🍏 ...


A perk in Lisboa, where is a bit hard to find vegan food. This restaurant is managed by very lovely people, the owner is so 
kind and her wife too. Quality is top of the edge, they do not use much spices neither much salt or sugar, but yet ...

Good solid vegan food. Not inventive just very good.  Very nice out of the way location.

Complete Code

Of course, you can scrape more valuable data from the page but for the current scenario, the code will look like this.

import time
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from rich import print


# the category for which we seek reviews
CATEGORY = "vegan restaurants"
# the location
LOCATION = "Lisbon, Portugal"
# google's main URL
URL = "https://www.google.com/"


if __name__ == '__main__':
    with sync_playwright() as pw:
        # creates an instance of the Chromium browser and launches it
        browser = pw.chromium.launch(headless=False)
        # creates a new browser page (tab) within the browser instance
        page = browser.new_page()
        # go to url with Playwright page element
        page.goto(URL)
        # deal with cookies page
        page.click('.QS5gu.sy4vM')
        # write what you're looking for
        page.fill("textarea", f"{CATEGORY} near {LOCATION}")
        # press enter
        page.keyboard.press('Enter')
        # change to english
        page.locator("text='Change to English'").click()
        time.sleep(4)
        # click in the "Maps" HTML element
        page.click('.GKS7s')
        time.sleep(4)
        # scrolling
        for i in range(2):
            # tackle the body element
            html = page.inner_html('body')
            # create beautiful soup element
            soup = BeautifulSoup(html, 'html.parser')


            # select items
            categories = soup.select('.hfpxzc')
            last_category_in_page = categories[-1].get('aria-label')
            # scroll to the last item
            last_category_location = page.locator(
                f"text={last_category_in_page}")
            last_category_location.scroll_into_view_if_needed()
            # wait to load contents
            time.sleep(4)


        # get links of all categories after scroll
        links = [item.get('href') for item in soup.select('.hfpxzc')]


        for link in links:
            # go to subject link
            page.goto(link)
            time.sleep(4)
            # load all reviews
            page.locator("text='Reviews'").first.click()
            time.sleep(4)
            # create new soup
            html = page.inner_html('body')
            # create beautiful soup element
            soup = BeautifulSoup(html, 'html.parser')
            # scrape reviews
            reviews = soup.select('.MyEned')
            reviews = [review.find('span').text for review in reviews]
            # print reviews
            for review in reviews:
                print(review)
                print('\n')

Once you run the code it will look like this on your screen.

Conclusion

In the age of information, data is power, and Python equips us with the tools to access that power. With the knowledge you’ve gained from this article, you’re now equipped to scrape Google Maps reviews with ease, transforming raw data into actionable insights. Whether you’re a business owner aiming to monitor your online reputation, a researcher seeking to analyze customer sentiments, or simply a Python enthusiast looking for a practical project, the ability to extract and analyze Google Maps reviews is a valuable skill.

You can also use Selenium but to be honest, I was getting bored with Selenium (of course it’s a great library). Playwright brings flexibility and consumes much fewer resources than selenium does.

I hope you like this tutorial and if you do then please do not forget to share it with your friends and on your social media.

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status