How To Scrape Google Search Results using Python & Beautifulsoup

Google Scraping is one of the best methods to get comprehensive data from SERPs, as it provides insights into trends, competition, and consumer behavior.

Being one of the largest search engines, Google contains enormous data valuable for businesses and researchers. However, to efficiently and effectively scrape Google search results, your data pipeline must be robust, scalable, and capable of handling dynamic changes in Google’s structure.

Whether you are looking to build your own LLM model or you are trying to gain some insight from the market, a Google search scraper would be needed.

In this read, we will build a Google search result scraper from scratch using Python and the BeautifulSoup library, enabling you to automate data extraction and gain actionable insights from search engine data.

But let’s see some common use cases one can have to use a Google scraper.

Use Cases of Scraping Google Search Results

Analyze the latest market trends.
Build your own LLM model.
Track Ads for your marketing agency.
Price competitive intelligence.
Build a Rank Tracking System/Tool.

Why Python for Scraping Google Search Results?

Python is a widely used & simple language with built-in mathematical functions. Web scraping with Python is one of the most demanding skills in 2024 because AI is on a boom. It is also flexible and easy to understand even if you are a beginner. Plus the community is very big which helps if you face any syntax error during your initial days of coding.

Many forums like StackOverflow, GitHub, etc already have the answers to the errors you might face while coding when you scrape Google search results.

You can do countless things with Python but for now, we will learn web scraping Google search results with it.

Let’s Start Scraping Google Search Results with Python

In this section, we will be scraping Google search results using Python. Let’s focus on creating a basic Python script & designing a basic scraper that can extract data from the first 10 Google results.

What are we going to scrape?

For this tutorial, we are going to scrape these 4 things.

Position of the result
Source Link
Result Title
Result Description

Prerequisite to scrape Google search results

Generally, web scraping with Python is divided into two parts:

Fetching/downloading data by making an HTTP request.
Extracting essential data by parsing the HTML DOM.

Libraries & Tools

Beautiful Soup is a Python library for pulling data out of HTML and XML files.
Requests allow you to send HTTP requests very easily.

Setup

The setup is pretty simple. Just create a folder and install Beautiful Soup & requests libraries in it. I am assuming that you have already installed Python 3.x on your machine.

				
					mkdir scraper
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you like. I am using google.py.

Import the libraries we just installed in that file.

				
					from bs4 import BeautifulSoup
import requests

Preparing the Food

We have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. We will download the raw HTML from Google Search results using the requests library as shown below.

We will try to extract data from the first ten organic search results.

				
					headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
url='https://www.google.com/search?q=drogon&ie=utf-8&oe=utf-8&num=10'
html = requests.get(url,headers=headers)
print(html.status_code)

This will provide you with an HTML code for that target Google page. After running this code if you get a 200 status code then that means you have successfully scraped Google. With this our first step to download the raw data from Google is complete.

Parsing the raw HTML

The second step is to parse this raw HTML and extract the data as discussed before. For this step, we are going to use BeautifulSoup(BS4).

				
					soup = BeautifulSoup(html.text, ‘html.parser’)

The above code creates a BeautifulSoup object from the HTML content, using Python’s built-in HTML parser, which allows easy navigation and manipulation of the HTML structure.

When you inspect the Google page you will find that all the results come under a class g. Of course, this name will change after some time because Google doesn’t like scrapers and it keeps changing its layout. You have to keep this in check.

With BS4 we are going to find all these classes using its find_all() function.

				
					allData = soup.find_all("div",{"class":"g"})

Now, we will run a for loop to reach every item in the allData list. But before we code let’s find the location of the link, title, and description of each search result inside the HTML DOM.

As you can see in the above image, the link is located inside the a tag with attribute href.

The title is located inside the h3 tag with the class DKV0Md.

The description is stored inside the div tag with the class VwiC3b.

Now, we have the location of each element. We can use the find function of BS4 to find each of these elements. Let’s run the for loop and extract each of these details.

				
					g=0
Data = [ ]
l={}
for i in range(0,len(allData)):
                    link = allData[i].find('a').get('href')

                    if(link is not None):
                        if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
                            g=g+1
                            l["link"]=link
                            try:
                                l["title"]=allData[i].find('h3',{"class":"DKV0Md"}).text
                            except:
                                l["title"]=None

                            try:
                                l["description"]=allData[i].find("div",{"class":"VwiC3b"}).text
                            except:
                                l["description"]=None

                            l["position"]=g

                            Data.append(l)

                            l={}

                        else:
                            continue

                    else:
                        continue

The code is pretty simple but let me explain each step.

After running the for loop, we extract the link, title, and description of the result.
We are storing each result inside the object l.
Then finally we store the object l inside the list Data.
Once the loop ends you can access the results by printing the list Data.

On printing the list Data the output will look like this.

Finally, we were able to extract Google search results.

Now let’s see how we can save this data to a CSV file.

Storing data to a CSV file

We are going to use the pandas library to save the search results to a CSV file. The first step would be to import this library at the top of the script.

				
					import pandas as pd

Now we will create a pandas data frame using list Data

				
					df = pd.DataFrame(Data)
df.to_csv('google.csv', index=False, encoding='utf-8')

Again once you run the code you will find a CSV file inside your working directory.

Complete Code

You can surely scrape many more things from this target page, but currently, the code will look like this.

				
					from bs4 import BeautifulSoup
import requests
import pandas as pd


headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}

url='https://www.google.com/search?q=drogon&num=10'
html = requests.get(url,headers=headers)

soup = BeautifulSoup(html.text, 'html.parser')

allData = soup.find_all("div",{"class":"g"})

g=0
Data = [ ]
l={}
for i in range(0,len(allData)):
                    link = allData[i].find('a').get('href')

                    if(link is not None):
                        if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
                            g=g+1
                            l["link"]=link
                            try:
                                l["title"]=allData[i].find('h3',{"class":"DKV0Md"}).text
                            except:
                                l["title"]=None

                            try:
                                l["description"]=allData[i].find("div",{"class":"VwiC3b"}).text
                            except:
                                l["description"]=None

                            l["position"]=g

                            Data.append(l)

                            l={}

                        else:
                            continue

                    else:
                        continue


print(Data)
df = pd.DataFrame(Data)
df.to_csv('google.csv', index=False, encoding='utf-8')

Well, this approach is not scalable because Google will block all the requests after a certain number of connections. We need some advanced scraping tools to overcome this problem.

Limitations of scraping Google search results with Python

Although the above approach is great if you are not looking to scrape millions of pages. But if you want to scrape Google at scale then the above approach will fall flat and your data pipeline will stop working.

Problems with our last approach

Since we are using the same IP for every request, Google will identify it later as a crawler and will ban the IP from accessing its services.
Along with IP headers should change too but we are using the same header again and again.

Solution

With Scrapingdog’s Google Scraper API, you don’t have to worry about proxy rotations or retries. Scrapingdog will handle all the hassle of proxy and header rotation and seamlessly deliver the data to you.

You can scrape millions of pages without getting blocked with Scrapingdog. Let’s see how we can use Scrapingdog to scrape Google at scale.

Scraping Google Search Results without getting blocked

Now, that we know how to scrape Google search results using Python and Beautifulsoup, we will look at a solution that can help us scrape millions of Google pages without getting blocked.

We will use Scrapingdog’s Google Search Result Scraper API for this task. This API handles everything from proxy rotation to headers. You just have to send a GET request and in return, you will get beautiful parsed JSON data.

This API offers a free trial and you can register for that trial from here. After registering for a free account you should read the docs to get the complete idea of this API.

				
					import requests
api_key = "Paste-your-own-API-key"
url = "https://api.scrapingdog.com/google/"
params = {
"api_key": api_key,
"query": "drogon",
"results": 10,
"country": "us",
"page": 0
}
response = requests.get(url, params=params)
if response.status_code == 200:
  data = response.json()
  print(data)
else:
  print(f"Request failed with status code: {response.status_code}")

The code is pretty simple. We are sending a GET request to https://api.scrapingdog.com/google/ along with some parameters. For more information on these parameters, you can again refer to the documentation.

Once you run this code you will get a beautiful JSON response.

The code is simple. We are sending a GET request to https://api.scrapingdog.com/google/ along with some parameters. For more information on these parameters, you can again refer to thedocumentation. Once you run this code you will get a beautiful JSON response

In this JSON response, you will get Google ads and People also ask data as well. So, you are getting full data from Google not just organic results. What if I need results from a different country? As you might know google show different results in different countries for the same query. Well, I just have to change the country parameter in the above code. Let’s say you need results from the United Kingdom. For this, I just have to change the value of the country parameter to gb(ISO code of UK). You can even extract 100 search results instead of 10 by just changing the value of the results parameter.

Using Google’s API to Scrape Google Search Results

Google offers its API to extract data from its search engine. It is available at this link for anyone who wants to use it. However, the usage of this API is minimal due to the following reasons: –

The API is very costly — For every 1000 requests you make, it will cost you around $5, which doesn’t make sense as you can do it for free with web scraping tools.
The API has limited functionality — It is made to scrape only a small group of websites, although by doing changes to it you can scrape the whole web again which would cost you time.
Limited Information — The API is made to provide you with little information, thus any data extracted may not be useful.

Conclusion

In this article, we saw how we can scrape Google results with Python and BS4. Then we used web scraping API for scraping Google at scale without getting blocked.

Google has a sophisticated anti-scraping wall that can prevent mass scraping but Scrapingdog can help you by providing a seamless data pipeline that never gets blocked.

If you like this article please do share it on your social media accounts. If you have any questions, please contact me at [email protected].

Frequently Asked Questions

Does Google ban scraping?

When you send google request from the same IP, it ultimately bans you. However, by using a google scraper API, you can scrape the google search results fast and without getting blocked.

How can I use a google scraper API?

It is easy to use google scraper API. For a step by step instructions, you can check out this documentation.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.

Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

How to scrape Google Patents

In this blog, we have scraped Google Patents using Python.

2024-10-21

How to Scrape Google Shopping Results

In this blog, we will scrape Google shopping results using Python and Scrapingdog.

2024-10-21

BIG NEWS: Scrapingdog is collaborating with Serpdog.

Products

Resources

How To Scrape Google Search Results using Python & Beautifulsoup

Table of Contents

Use Cases of Scraping Google Search Results

Why Python for Scraping Google Search Results?

Let’s Start Scraping Google Search Results with Python

What are we going to scrape?

Prerequisite to scrape Google search results

Libraries & Tools

Setup

Preparing the Food

Parsing the raw HTML

Storing data to a CSV file

Complete Code

Limitations of scraping Google search results with Python

Problems with our last approach

Solution

Scraping Google Search Results without getting blocked

Using Google’s API to Scrape Google Search Results

Conclusion

Frequently Asked Questions

Additional Resources

Web Scraping with Scrapingdog

Web Scraping with Scrapingdog

Recent Blogs

Try Scrapingdog for Free!

Product

Scrapingdog vs Competitors

Learn Web Scraping

Company