< Back to Blog Overview

The Ultimate Guide to Web Scraping for everyone

2020-04-26
guide to web scraping
In this post, we’ll learn how to use web Scraping Tools & APIs to perform a quick and effective web-scraping for single-page applications. This can help us gather and use valuable data that isn’t always available via APIs. Let’s dive in.

What is web scraping?

Web scraping is a technique used to extract data from websites using certain tools or API. We extract data for either business purpose or for Data analysis. Here we are going to focus on tools that can be used by both Developers as well as Non-developers.
We perform web scraping because the target website has not exposed its API. Here are some common web Scraping scenarios:
  1. Scraping E-commerce websites for product data.
  2. Scraping Hotel booking websites for collecting reviews, ratings & pricing of the hotel.
  3. Scraping Emails for targetting customers.
  4. Scraping financial websites for data analysis or for preparing a machine learning model.

Requirements

Getting started with web scraping is easy and it is divided into two simple parts-
  1. Using a web scraping tool to make an HTTP request for data extraction.
  2. Extracting important JSON data by parsing the scraped HTML data.
For web scraping tool we are going to use Scrapingdog. They offer 1000 FREE credits & their service can be easily used directly either from their Tool or API. So first, register here and get started with me.
Now, after successful registration, you will be redirected to a dashboard which looks like below
Scrapingdog a web scraping tool
Now, if you are a developer and don’t want to use this tool then just go to their API documentation and start Scraping.
First Let me just explain what each 8 input box means here.
  1. You have to paste the URL of the website you are going to scrape.
  2. Paste the key of your account, which is available right above this tool.
  3. Now, you can either render JavaScript or you can leave it as it is. Rendering JavaScript means it will open that website in headerless chrome and extract all the dynamic data available within that target website. If you think the target website is static then leave it as it is.
  4. Then you have a Premium proxy option which enables you to use premium proxies for websites that are harder to scrape.
  5. Then you have a geographical position proxy which helps you to get local data of any country.
  6. The last three options are for specifying HTML attributes & tags using which Scrapingdog provides us JSON data directly from scraped HTML data.

Make the First Request

We are done with all the ingredients we need to scrape a website, let's start scraping. We are scraping data from the HackerNews website for which we need to make an HTTP request to get the website’s content. That’s where Scrapingdog comes into action. Just paste the link inside the first input box and your API key inside the second box.
Scrapingdog a web scraping tool
Web Scraping tool of Scrapingdog
Then just click Scrape and voila. Your HTML data will be available in the other box. Isn’t that amazing? How fast we can scrape data today. So, in just 2 seconds without any setup, you have managed to scrape a dynamic website. You can also directly copy that data by using the “Copy Data” button.
guide to web scraping
Data extracted from Scrapingdog
We are getting similar HTML content which we get while making a request from Chrome or any browser. Now we need some help of Chrome Developer Tools to search through the HTML of a web page and select the required data. You can learn more about the Chrome DevTools from here.
We want to scrape the News heading in JSON format. You can view the HTML of the webpage by right-clicking anywhere on the webpage and selecting “Inspect”.
Scrapingdog a web scraping tool
Chrome dev tools for inspecting HTML
Now here comes another great feature of Scrapingdog.com. We can specify the attributes & tags to get JSON response with all the News heading. Now, here the attribute is “class”, its name is “title” & the tag is “td”. Just mention this information inside the tool. Like below.
Scrapingdog a web scraping tool
Then again just click Scrape to get JSON data. You will receive something like below.
Scrapingdog a web scraping tool
JSON received from Scrapingdog
Fantastic! This is what we were looking for. This is the JSON response which contains all the News headings on ycombinator.com. This is the fastest I have been able to scrape any website. You can just copy and share it with anyone or maybe use in your project. This all can also be done by their API. In this way, we can scrape the data from various large numbers of websites including Google, Facebook, Instagram, etc. So, our food is prepared and looks delicious too.
Oh! it also offers an extension that can be used remotely if you don’t want to access the dashboard.

Conclusion

In this article, we first understood what is web scraping and how we can use it for automating various operations for collecting data from various websites.
Many websites are using Single Page Application (SPA) architecture to generate content dynamically on their websites using JavaScript. So, in our next tutorial, we will learn how we can scrape dynamic websites without getting blocked. I will release the second tutorial in the coming week with much more adventure. So, stay tuned with me.
Feel free to comment and ask me anything. You can follow us on Twitter and Medium. Thanks for reading! 👍
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!