< Back to Blog Overview

Scrape Google Search Results for any Country

03-05-2021

 In this post, we will learn to scrape Google search results for any specific country using Python and a free residential proxy. But first, we will focus on creating a basic python script that can scrape the first 10 results. The end result will be JSON data that will consist of link, title, description, and position. You can use this data for SEO, product verifications, etc.

Requirements

Generally, web scraping is divided into two parts:

  1. Fetching data by making an HTTP request.
  2. Extracting essential data by parsing the HTML DOM.

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Residential Proxy to extract the HTML code of the target URL.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x.

mkdir scraper 
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you like. I am using google.py.

The import the libraries we just installed in that file.

from bs4 import BeautifulSoup
import requests

Preparing the Food

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. Now we will scrape Google Search results using requests library as shown below. We will first try to scrape 10 search results and then we will focus on country-specific results.

headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}url='https://www.google.com/search?q=pizza&ie=utf-8&oe=utf-8&num=10'
html = requests.get(url,headers=headers)

this will provide you with an HTML code of that target URL. Now, you have to use BeautifulSoup to parse HTML.

soup = BeautifulSoup(html.text, 'html.parser')

When you inspect the google page you will find that all the results come under a class “g”. Of course, this name will change after some time because google doesn’t like scrapers. You have to keep this in check.

We will extract all the classes with the name “g”.

allData = soup.find_all("div",{"class":"g"})

Now, we will run a for loop to reach each and every item in the allData list.

g=0
Data = [ ]
l={}
for i in range(0,len(allData)):
link = allData[i].find('a').get('href')
if(link is not None):
if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
g=g+1
l["link"]=link
try:
l["title"]=allData[i].find('h3').text
except:
l["title"]=None
try:
l["description"]=allData[i].find("span",{"class":"aCOpRe"}).text
except:
l["description"]=None
l["position"]=g Data.append(l) l={} else:
continue
else:
continue
print(Data)

Inside for loop, we have to find the website link, title, and description. We can find the link inside the tagtitle in h3 tag, and description in a span tag with class aCOpRe.

We have to filter out the legit google links from the raw data. Therefore we have used find() method to filter out the garbage and ad links. You can filter out ad links just by checking whether they contain ‘aclk’ within the URL string. Then we will add all the data inside a dictionary l and then append it to list Data.

On printing the list Data the output will look like this.

This method is not reliable because google will block you after certain requests. We need some advanced tools to overcome this problem.

Google results from different countries

Now, since we have learned to scrape google search results we should move on to learn even more advanced techniques. Google shows different results in different countries for the same keyword. So, we will now scrape the results according to country origin. We will use a Residential proxy to achieve our results.

First, we will create a list of user agents so that we can rotate them on every request. I will create a list of 10 user agents. If you want more then you can find them here.

userAgents=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36','Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36','Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36','Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36','Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1']

Now, we need a residential proxy provider though which we can rotate proxies and change the origin of the request. When you signup to Scrapingdog you get 1000 free requests. You can find the proxy documentation here.

You will find your proxy URL on the dashboard. We will create a proxy object to pass it on to the requests method.

http_proxy  = "http://scrapingdog:[email protected]:8081"
https_proxy = "http://scrapingdog:[email protected]:8081"
proxyDict = {"http" : http_proxy,"https" : https_proxy}

We have used -country=us as a param in our proxy to use USA proxies. Similarly, you can use ‘ca’ for Canada, ‘gb’ for England, ‘in’ for India, etc.

We will use the random library to rotate user agents.

from random import randrange
headers={'User-Agent':userAgents[randrange(10)]}
html = requests.get(url,proxies=proxyDict,headers=headers)

And that’s it. All the rest of the code will remain the same as earlier. As earlier, we will create a Beautifulsoup object and then extract the same classes. But this time google won’t be able to block as you are using a new IP on every request.

For the USA, the results will look like this.

For the United Kingdom, the results will look like this.

Similarly, you can check for other countries.

But if you want to avoid handling all this hassle then you can use our Google Search API to scrape google search results in just one single GET request.

Conclusion

In this article, we learned how we can scrape data from Google using Python & Residential Proxy regardless of the type of website. Feel free to comment and ask me anything. You can follow me on Twitter. Thanks for reading and please hit the like button! 👍

Additional Resources

Here are a few additional resources that you may find helpful during your web scraping journey:

Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!