Knowing how to scrape the Google Finance website, can be advantageous when it comes to:
- Data Aggregation: Google Finance hosts data from different sources, minimizing the need to look for data elsewhere.
- Sentiment Analysis: The website displays news from several sources. These can be scraped to gather insights about the market’s sentiment.
- Market Predictions: It provides historical data and real-time information from several stock market indexes. Resulting in a very effective source for price predictions.
- Risk Management: Google Finance minimizes arbitrage, thanks to its accurate and up-to-date data, which is crucial for assessing the risk associated with specific investment strategies.
Web scraping Google Finance can be achieved using Beautiful Soup and Requests Python’s libraries.
Why Beautiful Soup as the scraping tool?
pip install beautifulsoup4
pip install requests
How to extract information from stocks?
import requests from bs4 import BeautifulSoup BASE_URL = "https://www.google.com/finance" INDEX = "NASDAQ" SYMBOL = "META" LANGUAGE = "en" TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"
# make an HTTP request page = requests.get(TARGET_URL) # use an HTML parser to grab the content from "page" soup = BeautifulSoup(page.content, "html.parser")
The items that describe the stock are represented by the class gyFHrc. Inside each one of these elements, there’s a class that represents the title of the item (Previous close for instance) and the value ($295.89). The first can be grabbed from the mfs7Fc class, and the second from the P6K39c respectively. The complete list of items to be scraped is the following:
- Previous Close
- Day Range
- Year Range
- Market Cap
- AVG Volume
- P/E Ratio
- Dividend Yield
- Primary Exchange
- CEO
- Founded
- Website
- Employees
Let’s now see how we can crawl these items with Python code.
# get the items that describe the stock items = soup.find_all("div", {"class": "gyFHrc"}) # create a dictionary to store the stock description stock_description = {} # iterate over the items and append them to the dictionary for item in items: item_description = item.find("div", {"class": "mfs7Fc"}).text item_value = item.find("div", {"class": "P6K39c"}).text stock_description[item_description] = item_value print(stock_description)
The function .find_all() was used to target all the elements containing the class gyFHrc. Unlike .find_all(), the function .find() only retrieves one element. That’s why it is used inside the for loop because in this case, we know that there’s only one mfs7Fc and P6K39c for each iterable item. The .text() attribute, concatenates all the pieces of text that are inside each element which is the information displayed on the webpage.
The loop in the code snippet above serves to build a dictionary of items that represent the stock. This is a good practice because the dictionary structure can easily be converted to other file formats such as a .json file or a .csv file, depending on the use case.
The output:
{'Previous close': '$295.89', 'Day range': '$294.47 - $301.74', 'Year range': '$88.09 - $326.20', 'Market cap': '762.63B USD', 'Avg Volume': '22.93M', 'P/E ratio': '35.49', 'Dividend yield': '-', 'Primary exchange': 'NASDAQ', 'CEO': 'Mark Zuckerberg', 'Founded': 'Feb 2004', 'Website': 'investor.fb.com', 'Employees': '71,469'}
Complete Code
import requests from bs4 import BeautifulSoup BASE_URL = "https://www.google.com/finance" INDEX = "NASDAQ" SYMBOL = "META" LANGUAGE = "en" TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}" # make an HTTP request page = requests.get(TARGET_URL) # use an HTML parser to grab the content from "page" soup = BeautifulSoup(page.content, "html.parser") # get the items that describe the stock items = soup.find_all("div", {"class": "gyFHrc"}) # create a dictionary to store the stock description stock_description = {} # iterate over the items and append them to the dictionary for item in items: item_description = item.find("div", {"class": "mfs7Fc"}).text item_value = item.find("div", {"class": "P6K39c"}).text stock_description[item_description] = item_value print(stock_description)
Limitations while scraping Google Finance
Using the above method you can create a small scraper but this scraper will not continue to supply you with data if you are going to do mass scraping. Google is very sensitive to data crawling and it will ultimately block your IP.
Once your IP is blocked you will not be able to scrape anything and your data pipeline will finally break. Now, how to overcome this issue? Well, there is a very easy solution for this and that is to use a Google Scraping API.
Let’s see how we can use this API to crawl limitless data from Google Finance.
Using Scrapingdog for scraping Google Finance
import requests from bs4 import BeautifulSoup BASE_URL = "http://api.scrapingdog.com/google/?api_key=YOUR-API-KEY&query=https://www.google.com/finance" INDEX = "NASDAQ" SYMBOL = "META" LANGUAGE = "en" TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}" # make an HTTP request page = requests.get(TARGET_URL) # use an HTML parser to grab the content from "page" soup = BeautifulSoup(page.content, "html.parser") # get the items that describe the stock items = soup.find_all("div", {"class": "gyFHrc"}) # create a dictionary to store the stock description stock_description = {} # iterate over the items and append them to the dictionary for item in items: item_description = item.find("div", {"class": "mfs7Fc"}).text item_value = item.find("div", {"class": "P6K39c"}).text stock_description[item_description] = item_value print(stock_description)
In place of YOUR-API-KEY you have to paste your API key. One thing you might have noticed is that apart from the
BASE_URL nothing has changed in the code. This is the beauty of using the web scraping APIs.
Using this code you can scrape endless Google Finance pages. If you want to crawl this then I would advise you to read
web crawling with Python.
Conclusion
With the combination of requests and bs4, we were able to scrape Google Finance. Of course, if the scraper needs to survive then you have to use a proxy scraping APIs.
We have explored the fascinating world of web scraping Google Finance using Python. Throughout this article, we have learned how to harness the power of various Python libraries, such as BeautifulSoup and Requests, to extract valuable financial data from one of the most trusted sources on the internet.
Scraping financial data from Google Finance can be a valuable skill for investors, data analysts, and financial professionals alike. It allows us to access real-time and historical information about stocks, indices, currencies, and more, enabling us to make informed decisions in the world of finance.
I hope you like this tutorial and if you do then please do not forget to share it with your friends and on your social media.