In an ideal world, you would have access to all the data in an easy-to-import format from the internet and use them for your convenience. However, that’s not the situation for now, and you will need to learn complex coding to get your desired data.
Now, is there any way to scrape data from a website without learning to program?
You can conveniently scrape data with Google Sheet without learning to program. In this article, we will discuss the process of automated Google Sheet Web Scraping. Stay tuned.
Build a Web Scraper Using ImportXML in Google Spreadsheets
To start scraping with Google Sheet, you will need to start with ImportXML. Let’s discuss it in detail:
Enter Google Sheets. Meet the IMPORTXML function.
According to Google’s support page, IMPORTXML “imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.”
IMPORTXML is a function that allows you to scrape structured data from web pages without any coding knowledge.
It’s quick and easy to extract data such as page titles, descriptions, or links and can also be used for more complex information.
How to Work with ImportXML
The function is quite simple and needs two values:
- The URL of the webpage that intends to extract or scrape the information.
- XPath of the element in which the data is contained.
XPath stands for XML Path Language and can be used to navigate through elements and attributes in an XML document.
Here is a shortlist of some of the most common and useful XPath queries:
- Page title: //title
- Page meta description: //meta[@name=’description’]/@content
- Page H1: //h1
- Page links: //@href
How does ImportXML Works?
Since discovering IMPORTXML in Google Sheets, it has truly become one of the secret weapons in automation, from campaign and ad creation to content research and more.
Moreover, the function combined with other formulas and add-ons can be used for more advanced tasks. Otherwise, it may require sophisticated solutions and development, such as tools built in Python.
But in this instance, let’s look at IMPORTXML in its most basic form: scraping data from a web page.
One way to do this would be to open two browser windows — one with the website and the other with Google Sheets or Excel. Then start copying and pasting the information over, link by link and article by article.
But using IMPORTXML in Google Sheets can achieve the same output with almost no risk of making mistakes within less time.
- Step 1: Open a blank new Google Sheet document
- Step 2: Add the pages you want to scrape data
- Step 3: Find the XPath of the element from where the content needs to be imported into the data spreadsheet.
- Open the page(s) in chrome
- Hover over the title of the articles
- Right-click on one of the articles and select inspect
- A Chrome Dev Tools Window will open
- Ensure the article title is selected and highlighted
- Right-click and select Copy > Copy XPath
- Step 4: Go back in the Google Sheed and type “=IMPORTXML(stored space in sheet,”//*[starts-with(@id, ‘title’)]”)” without the inverted commas before and after
- You will need to replace “stored space in sheet” with the place where you have put your URL (for example, if you have put the URL in the C2 block, put C2 instead of stored space in sheet)
- When copying the XPath from Chrome, enclose it in double-quotes all the time (for example: add XPath in the following way (//*[@id=”title_1″]))
- If the formula breaks down, change the double quote sign into a single quote sign (for example: add XPath like (//*[@id=’title_1’]); if the formula breaks down
- As the page ID title changes for each article, you will need to modify the query slightly.
Within a few moments, the query will be loaded and the data will be shown in the spreadsheet. The list will feature all articles featured on the pages you want to scrape.
You can apply this scraping process to get any data you want to set up your campaign.
If you want to scrape data from landing page URLs, tweak the query to specify that you want the HERF elements. In those cases, the query will be:
=IMPORTXML(stored space in spreadsheet,”//*[starts-with(@id, ‘title’)]/@href”)
With the added @herf you will gent the landing page URLs. The same thing can be done for author names and featured snippets.
Troubleshooting Scraping Data with Google Sheet
You will need to have enough free cells to fill and fully expand with all the data in your spreadsheet that was returned from your query. If the column is filled with any other data, you may not get the full result in your spreadsheet.
Similar to ARRAYFORMULA, avoid storing different data in the same column.
Why Consider Google Automation for Web Scraping?
So, here is how to scrape data with Google Sheet from any webpage. The whole process is automated and has less chance of error. You can use this data to generate website content, set product descriptions, or get eCommerce data like product pricing and shipping costs.
With these data in your hand, you can generate better campaigns and gain better results. Also, IMPORTXML reduces the execution time and chance of error. Overall, Google sheet scraping is convenient and allows you to scrape data without learning to code.
In conclusion, automated Google Sheet web scraping can save you a lot of time and effort when compared to manual web scraping. Not only is it more accurate and faster, but it can also be easily customized to your specific needs.
If you need to gather data from the web on a regular basis, automated Google Sheet web scraping is definitely the way to go.