< Back to Blog Overview

How To Web Scrape Google Search Results using Python Beautifulsoup

27-04-2024

In today’s blog, we’ll be diving into web scraping Google Search results, here we will use Python and BeautifulSoup to extract valuable information. We will make a Google Search scraper of our own that can automate the process of pulling organic data from search results. I have made a dedicated tutorial on BeautifulSoup for web scraping. Do check that out too after completing this article.

As we move forward, you will learn how to effectively scrape data from Google search results, gaining the ability to gather large amounts of data quickly and efficiently. Get ready as we unfold the steps to extract data from Google search results, transforming the vast ocean of information available into a structured database for your use. Additionally, if you are scraping Google search results for SEO, you should have some website optimization checklist already made with you to follow the process along the line.

Use Cases of Scraping Google Search Results

  1. Google Scraping can analyze Google’s algorithm and identify its main trends.
  2. It can gain insights for Search engine optimization (SEO) — monitor how your website performs in Google for specific queries over some time.
  3. It can analyze ad ranking for a given set of keywords.
  4. SEO tools web scrape Google search results and design a Google search scraper to give you the average volume of keywords, their difficulty score, and other metrics.

Also, if you are in a hurry and straight away want to extract data from Google Search Results. I would suggest you use Google Search Scraper API. The output you get is in JSON format.

Read More: What Is Search Engine Result Scraping?

Scraping Google Search Results using Python
Scraping Google Search Results using Python

Why Python for Scraping Google Search Results?

Python is a widely used & simple language with built-in mathematical functions. Python for data science is one of the most demanding skills in 2023. It is also flexible and easy to understand even if you are a beginner. The Python community is too big and it helps when you face any error while coding.

Many forums like StackOverflowGitHub, etc already have the answers to the errors you might face while coding when you scrape Google search results.

You can do countless things with Python but for now, we will learn web scraping Google search results with it.

Read More: Web scraping 101 with Python (A beginner-friendly tutorial)

Let’s Start Scraping Google Search Results with Python

In this section, we will be scraping Google search results using Python. Let’s focus on creating a basic Python script & designing a basic scraper that can extract data from the first 10 Google results.

What are we going to scrape?

Google Search Result Page

For this tutorial, we are going to scrape these 4 things.

  • Position of the result
  • Link
  • Title
  • Description

It is a good practice to decide this thing in advance.

Prerequisite to scrape Google search results

Generally, web scraping with Python is divided into two parts:

  1. Fetching/downloading data by making an HTTP request.
  2. Extracting essential data by parsing the HTML DOM.

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. To create a folder and install libraries type below given commands in your command line. I am assuming that you have already installed Python 3.x.

mkdir scraper
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you like. I am using google.py.

Import the libraries we just installed in that file.

from bs4 import BeautifulSoup
import requests

Preparing the Food

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. We will scrape Google Search results using the requests library as shown below.

We will first try to extract data from the first 10 search results and then we will focus on how we can scrape country-specific results.

headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
url='https://www.google.com/search?q=pizza&ie=utf-8&oe=utf-8&num=10'
html = requests.get(url,headers=headers)
print(html.status_code)

This will provide you with an HTML code for that target Google page. After running this code if you get a 200 status code then that means you have successfully scraped Google. With this our first step to download the raw data from Google is complete.

Our second step is to parse this raw HTML and extract the data as discussed before. For this step, we are going to use BeautifulSoup(BS4).

soup = BeautifulSoup(html.text, ‘html.parser’)

When you inspect the Google page you will find that all the results come under a class g. Of course, this name will change after some time because Google doesn’t like scrapers. You have to keep this in check.

With BS4 we are going to find all these classes using its find_all() function.

allData = soup.find_all(“div”,{“class”:”g”})

Now, we will run a for loop to reach every item in the allData list. But before we code let’s find the location of the link, title, and description of each search result.

As you can see in the above image, the link is located inside the a tag with attribute href.

The title is located inside the h3 tag with the class DKV0Md.

The description is stored inside the div tag with the class VwiC3b.

Now, we have the location of each element. We can use the find() function of BS4 to find each of these elements. Let’s run the for loop and extract each of these details.

g=0
Data = [ ]
l={}
for i in range(0,len(allData)):
                    link = allData[i].find('a').get('href')

                    if(link is not None):
                        if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
                            g=g+1
                            l["link"]=link
                            try:
                                l["title"]=allData[i].find('h3',{"class":"DKV0Md"}).text
                            except:
                                l["title"]=None

                            try:
                                l["description"]=allData[i].find("div",{"class":"VwiC3b"}).text
                            except:
                                l["description"]=None

                            l["position"]=g

                            Data.append(l)

                            l={}

                        else:
                            continue

                    else:
                        continue

The code is pretty simple but let me explain each step.

  • After running the for loop, we extract the linktitle, and description of the result.
  • We are storing each result inside the object l.
  • Then finally we store the object l inside the list Data.
  • Once the loop ends you can access the results by printing the list Data.

On printing the list Data the output will look like this.

Finally, we were able to extract Google search results.

Now let’s see how we can save this data to a CSV file.

Storing data to a CSV file

We are going to use the pandas library to save the search results to a CSV file.

The first step would be to import this library at the top of the script.

import pandas as pd

Now we will create a pandas data frame using list Data.

df = pd.DataFrame(Data)
df.to_csv('google.csv', index=False, encoding='utf-8')

Again once you run the code you will find a CSV file inside your working directory.

Complete Code

You can surely scrape many more things from this target page, but currently, the code will look like this.

from bs4 import BeautifulSoup
import requests

headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}

url='https://www.google.com/search?q=pizza&ie=utf-8&oe=utf-8&num=10'
html = requests.get(url,headers=headers)

soup = BeautifulSoup(html.text, 'html.parser')

allData = soup.find_all("div",{"class":"g"})

g=0
Data = [ ]
l={}
for i in range(0,len(allData)):
                    link = allData[i].find('a').get('href')

                    if(link is not None):
                        if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
                            g=g+1
                            l["link"]=link
                            try:
                                l["title"]=allData[i].find('h3',{"class":"DKV0Md"}).text
                            except:
                                l["title"]=None

                            try:
                                l["description"]=allData[i].find("div",{"class":"VwiC3b"}).text
                            except:
                                l["description"]=None

                            l["position"]=g

                            Data.append(l)

                            l={}

                        else:
                            continue

                    else:
                        continue

print(Data)
df = pd.DataFrame(Data)
df.to_csv('google.csv', index=False, encoding='utf-8')

Well, this approach is not scalable because Google will block all the requests after a certain number of connections. We need some advanced scraping tools to overcome this problem.

Limitations of scraping Google search results with Python

Although Python is an excellent language for web scraping Google search results still there are some limitations to it. Since it is a dynamic language it can lead to runtime errors and it cannot handle multiple threads as well as other languages.

Further, a slow response rate is observed while using Python for scraping Google search results.

Other than that you cannot mass scrape Google with the above code because Google will ultimately block your script for such a large amount of traffic from just one single IP.

With Scrapingdog’s Google Scraper API, you don’t have to worry about proxy rotations or retries. Scrapingdog will handle all the hassle and seamlessly deliver the data.

Let’s see how we can use Scrapingdog to scrape Google at scale.

Scraping Google Search Results without getting blocked

Now, that we know how to scrape Google search results using Python and beautifulsoup, we will look at a solution that can help us scrape millions of Google pages without getting blocked.

We will use Scrapingdog’s Google Search Result Scraper API for this task. This API handles everything from proxy rotation to headers. You just have to send a GET request and in return, you will get parsed JSON data.

This API offers a free trial and you can register for that trial from here. After registering for a free account you should read the docs to get the complete idea of this API.

import requests
api_key = "Paste-your-own-API-key"
url = "https://api.scrapingdog.com/google/"
params = {
    "api_key": api_key,
    "query": "football",
    "results": 10,
    "country": "us",
    "page": 0
}
response = requests.get(url, params=params)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code: {response.status_code}")

The code is simple. We are sending a GET request to https://api.scrapingdog.com/google/ along with some parameters. For more information on these parameters, you can again refer to the documentation.

Once you run this code you will get a beautiful JSON response.

What if I need results from a different country? As you might know google show different results in different countries for the same query. Well, I just have to change the country parameter in the above code.

Let’s say you need results from the United Kingdom. For this, I just have to change the value of the country parameter to gb(ISO code of UK).

You can even extract 100 search results instead of 10 by just changing the value of the results parameter.

Using Google’s API to Scrape Google Search Results

Google offers its API to extract data from its search engine. It is available at this link for anyone who wants to use it. However, the usage of this API is minimal due to the following reasons: –

  • The API is very costly — For every 1000 requests you make, it will cost you around $5, which doesn’t make sense as you can do it for free with web scraping tools.
  • The API has limited functionality — It is made to scrape only a small group of websites, although by doing changes to it you can scrape the whole web again which would cost you time.
  • Limited Information — The API is made to provide you with little information, thus any data extracted may not be useful.

Conclusion

In this article, we saw how we can scrape Google results with Python and BS4. Then we used web scraping API for scraping Google at scale without getting blocked.

Google has a sophisticated anti-scraping wall that can prevent mass scraping but Scrapingdog can help you by providing a seamless data pipeline that never gets blocked.

If you like this article please do share it on your social media accounts. If you have any questions, please feel free to reach out to me.

Frequently Asked Questions

It is easy to use google scraper API. For a step by step instructions, you can check out this documentation.

Additional Resources

Here are a few additional resources that you may find helpful during your web scraping journey:

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status