< Back to Blog Overview

How to Scrape Google Finance with Python

07-09-2023

If you are an investor, a trader, an analyst, or just curious about the overall stock market. You’ve probably already stumbled on Google Finance. It provides up-to-date stock quotes from indexes, historical financial data, news, and currency conversion rates.

web scraping google finance with python
Web Scraping Google Finance

Knowing how to scrape the Google Finance website, can be advantageous when it comes to:

  • Data Aggregation: Google Finance hosts data from different sources, minimizing the need to look for data elsewhere.
  • Sentiment Analysis: The website displays news from several sources. These can be scraped to gather insights about the market’s sentiment.
  • Market Predictions: It provides historical data and real-time information from several stock market indexes. Resulting in a very effective source for price predictions.
  • Risk Management: Google Finance minimizes arbitrage, thanks to its accurate and up-to-date data, which is crucial for assessing the risk associated with specific investment strategies.

Web scraping Google Finance can be achieved using Beautiful Soup and Requests Python’s libraries.

Why Beautiful Soup as the scraping tool?

Beautiful Soup is one of the most used web scraping libraries in Python. It comprises extensive documentation, it’s easy to implement and to integrate with other libraries. To use it, you first need to set your Python’s virtual environment, then you can easily install it using the following command.

pip install beautifulsoup4

It is usually used side-by-side with the Requests Python library, which serves as the standard package for making HTTP requests. It generates the HTML instance, from which Beautiful Soup will interact to grab the required information. This library can also be installed via pip.

pip install requests

How to extract information from stocks?

To extract stock information from Google Finance, we first need to understand how to play with the website’s URL to crawl the desired stock. Let’s take for instance the NASDAQ index, which hosts several stocks from where we can grab information. To have access to the symbols of each stock, we can use NASDAQ’s stock screener in this link. Now let’s take META as our target stock. With both the index and stock we can build the first code snippet of our script.

import requests
from bs4 import BeautifulSoup


BASE_URL = "https://www.google.com/finance"
INDEX = "NASDAQ"
SYMBOL = "META"
LANGUAGE = "en"
TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"

Now we can use the Requests library to make an HTTP request on the TARGET_URL and create a Beautiful Soup instance to crawl the HTML content.

# make an HTTP request
page = requests.get(TARGET_URL)


# use an HTML parser to grab the content from "page"
soup = BeautifulSoup(page.content, "html.parser")

Before getting into scraping, we first need to tackle the HTML elements by inspecting the web page (TARGET_URL).

The items that describe the stock are represented by the class gyFHrc. Inside each one of these elements, there’s a class that represents the title of the item (Previous close for instance) and the value ($295.89). The first can be grabbed from the mfs7Fc class, and the second from the P6K39c respectively. The complete list of items to be scraped is the following:

  • Previous Close
  • Day Range
  • Year Range
  • Market Cap
  • AVG Volume
  • P/E Ratio
  • Dividend Yield
  • Primary Exchange
  • CEO
  • Founded
  • Website
  • Employees

Let’s now see how we can crawl these items with Python code.

# get the items that describe the stock
items = soup.find_all("div", {"class": "gyFHrc"})


# create a dictionary to store the stock description
stock_description = {}

# iterate over the items and append them to the dictionary
for item in items:
    item_description = item.find("div", {"class": "mfs7Fc"}).text
    item_value = item.find("div", {"class": "P6K39c"}).text
    stock_description[item_description] = item_value


print(stock_description)

The function .find_all() was used to target all the elements containing the class gyFHrc. Unlike .find_all(), the function .find() only retrieves one element. That’s why it is used inside the for loop because in this case, we know that there’s only one mfs7Fc and P6K39c for each iterable item. The .text() attribute, concatenates all the pieces of text that are inside each element which is the information displayed on the webpage.

The loop in the code snippet above serves to build a dictionary of items that represent the stock. This is a good practice because the dictionary structure can easily be converted to other file formats such as a .json file or a .csv file, depending on the use case.

The output:

{'Previous close': '$295.89', 'Day range': '$294.47 - $301.74', 'Year range': '$88.09 - $326.20', 'Market cap': '762.63B USD', 'Avg Volume': '22.93M', 'P/E ratio': '35.49', 'Dividend yield': '-', 'Primary exchange': 'NASDAQ', 'CEO': 'Mark Zuckerberg', 'Founded': 'Feb 2004', 'Website': 'investor.fb.com', 'Employees': '71,469'}

This is just an example of a simple script, that can be integrated into a trading bot, an application, or a simple dashboard to keep track of your favorite stocks.

Complete Code

You can definitely scrape many more data attributes from the page but for now, the complete code will look somewhat like this.

import requests
from bs4 import BeautifulSoup


BASE_URL = "https://www.google.com/finance"
INDEX = "NASDAQ"
SYMBOL = "META"
LANGUAGE = "en"
TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"

# make an HTTP request
page = requests.get(TARGET_URL)


# use an HTML parser to grab the content from "page"
soup = BeautifulSoup(page.content, "html.parser")


# get the items that describe the stock
items = soup.find_all("div", {"class": "gyFHrc"})


# create a dictionary to store the stock description
stock_description = {}


# iterate over the items and append them to the dictionary
for item in items:
    item_description = item.find("div", {"class": "mfs7Fc"}).text
    item_value = item.find("div", {"class": "P6K39c"}).text
    stock_description[item_description] = item_value


print(stock_description)

Limitations while scraping Google Finance

Using the above method you can definitely create a small scraper but this scraper will not continue to supply you with data if you are going to do mass scraping. Google is very sensitive to data crawling and it will ultimately block your IP.

Once your IP is blocked you will not be able to scrape anything and your data pipeline will finally break. Now, how to overcome this issue? Well, there is a very easy solution for this and that is to use a Google Scraping API.

Let’s see how we can use this API to crawl limitless data from Google Finance.

Using Scrapingdog for scraping Google Finance

Once you sign up for this web scraping API you will get your own API key(available on the dashboard). Now, just copy that API key to the below-provided code.

import requests
from bs4 import BeautifulSoup

BASE_URL = "http://api.scrapingdog.com/google/?api_key=YOUR-API-KEY&query=https://www.google.com/finance"
INDEX = "NASDAQ"
SYMBOL = "META"
LANGUAGE = "en"
TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"

# make an HTTP request
page = requests.get(TARGET_URL)

# use an HTML parser to grab the content from "page"
soup = BeautifulSoup(page.content, "html.parser")

# get the items that describe the stock
items = soup.find_all("div", {"class": "gyFHrc"})

# create a dictionary to store the stock description
stock_description = {}

# iterate over the items and append them to the dictionary
for item in items:
    item_description = item.find("div", {"class": "mfs7Fc"}).text
    item_value = item.find("div", {"class": "P6K39c"}).text
    stock_description[item_description] = item_value

print(stock_description)

In place of YOUR-API-KEY you have to paste your own API key. One thing you might have noticed is that apart from the BASE_URL nothing has changed in the code. This is the beauty of using the web scraping APIs.

Using this code you can scrape endless Google Finance pages. If you want to crawl this then I would advise you to read web crawling with Python.

Conclusion

With the combination of requests and bs4, we were able to scrape Google Finance. Of course, if the scraper needs to survive then you have to use a proxy scraping APIs.

We have explored the fascinating world of web scraping Google Finance using Python. Throughout this article, we have learned how to harness the power of various Python libraries, such as BeautifulSoup and Requests, to extract valuable financial data from one of the most trusted sources on the internet.

Scraping financial data from Google Finance can be a valuable skill for investors, data analysts, and financial professionals alike. It allows us to access real-time and historical information about stocks, indices, currencies, and more, enabling us to make informed decisions in the world of finance.

I hope you like this tutorial and if you do then please do not forget to share it with your friends and on your social media.

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status