< Back to Blog Overview

Web Scraping Indeed Jobs using Python

16-02-2024

Indeed is one of the biggest job listing platforms available in the market. They claim around 300M visitors on their website every month. As a data engineer, you want to identify which job is in great demand. Well, then you have to collect data from websites like Indeed to identify and make a conclusion.

Building an Indeed Scraper using Python

In this article, we are going to web scrape Indeed & create an Indeed Scraper using Python 3.x. We are going to scrape Python jobs from Indeed in New York.

At the end of this tutorial, we will have all the jobs that need Python as a skill in New York.

Why Scrape Indeed Jobs?

Scraping Indeed Jobs can help you in multiple ways. Some of the use cases for extracting data from it are: –

  • With this much data, you can train an AI model to predict salaries in the future for any given skill.
  • Companies can use this data to analyze what salaries their rival companies are offering for a particular skill set. This will help them improve their recruitment strategy.
  • You can also analyze what jobs are in high demand and what kind of skill set one needs to qualify for jobs in the future.

Setting up the prerequisites

We would need Python 3.x for this project and our target page will be this one from Indeed.

Page We Are Going To Scrape From Indeed
Page We Are Going To Scrape From Indeed

I am assuming that you have already installed Python on your machine. So, let’s move forward with the rest of the installation.

We would need two libraries that will help us extract data. We will install them with the help of pip.

  1. Requests — Using this library we are going to make a GET request to the target URL.
  2. BeautifulSoup — Using this library we are going to parse HTML and extract all the crucial data that we need from the page. It is also known as BS4.

Installation

pip install requests 
pip install beautifulsoup4

You can create a dedicated folder for Indeed on your machine and then create a Python file where we will write the code.

Let’s decide what we are going to scrape from Indeed.com

Whenever you start a scraping project, it is always better to decide in advance what exactly we need to extract from the target page.

Things we are going to scrape
Things we are going to scrape

We are going to scrape all the highlighted parts in the above image.

  • Name of the job
  • Name of the company
  • Their ratings
  • The salary they are offering
  • Job details

Let’s Start Indeed Job Scraping

Before even writing the first line of code, let’s find the exact element location in the DOM.

Inspecting Job Box in Source Code
Inspecting Job Box in Source Code

Every job box is a list tag. You can see this in the above image. And there are 18 of them on each page and all of them fall under the div tag with class jobsearch-ResultsList. So, our first job would be to find this div tag.

Let’s first import all the libraries in the file.

import requests
from bs4 import BeautifulSoup

Now, let’s declare the target URL and make an HTTP connection to that website.

l=[]
o={}
target_url = "https://www.indeed.com/jobs?q=python&l=New+York%2C+NY&vjk=8bf2e735050604df"
head= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Connection": "keep-alive",
    "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
}
resp = requests.get(target_url, headers=head)

We have declared an empty list and an empty object to store data at the end.

Sometimes(the majority of the time) you might get a 403 status code. To avoid getting blocked you will need a web scraping API.

Now, let’s find the ul tag using BS4.

soup = BeautifulSoup(resp.text, 'html.parser')

allData = soup.find("ul",{"class":"jobsearch-ResultsList css-0"})

Now, we have to iterate over each of these li tags and extract all the data one by one using a for loop.

alllitags = allData.find_all("div",{"class":"cardOutline"})

Now, we will run a for loop on this list alllitags.

Inspecting Name of the Job
Inspecting Name of the Job

As you can see in the image above that the name of the job is under the a tag. So, we will find this a tag and then extract the text out of it using .text() method of BS4.

The name of the company can be found under the div tag with class heading6 company_location tapItem-gutter companyInfo. Let’s extract this too.

try:
        o["name-of-the-company"]=alllitags[i].find("div",{"class":"companyInfo"}).find("span",{"class":"companyName"}).text
except:
        o["name-of-the-company"]=None

Here we have first found the div tag and then we have used the .find() method to find the span tag inside it. You can check the image above for more clarity.

Let’s extract the rating now.

Inspecting Rating of job post in source code
Inspecting the Rating of job post in source code

The rating can be found under the same div tag as the name of the company. Just the class of the span tag will change. The new class will be ratingsDisplay

try:
        o["rating"]=alllitags[i].find("div",{"class":"companyInfo"}).find("span",{"class":"ratingsDisplay"}).text
except:
        o["rating"]=None
Inspecting Salary in source code
Inspecting Salary in

The salary offer can be found under the div tag with class metadata salary-snippet-container.

try:
     o["salary"]=alllitags[i].find("div",{"class":"salary-snippet-container"}).text
except:
     o["salary"]=None

The last thing which we have to extract are the job details.

This is a list that can be found under the div tag with class metadata taxoAttributes-container.

try:
   o["job-details"]=alllitags[i].find("div",{"class":"metadata taxoAttributes-container"}).find("ul").text
except:
   o["job-details"]=None


l.append(o)
o={}

In the end, we have pushed our object o inside the list l and made the object o empty so that when the loop runs again it will be able to store data of the new job.

Let’s print it and see what are the results.

print(l)

Complete Code

You can make further changes to extract other details as well. You can even change the URL of the page to scrape jobs from the next pages.

But for now, the complete code will look like this.

import requests
from bs4 import BeautifulSoup

l=[]
o={}


target_url = "https://www.indeed.com/jobs?q=python&l=New+York%2C+NY&vjk=8bf2e735050604df"
head= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Connection": "keep-alive",
    "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
}

resp = requests.get(target_url, headers=head)
print(resp.status_code)
soup = BeautifulSoup(resp.text, 'html.parser')

allData = soup.find("ul",{"class":"jobsearch-ResultsList css-0"})

alllitags = allData.find_all("div",{"class":"cardOutline"})
print(len(alllitags))
for i in range(0,len(alllitags)):
    try:
        o["name-of-the-job"]=alllitags[i].find("a",{"class":"jcs-JobTitle css-jspxzf eu4oa1w0"}).text
    except:
        o["name-of-the-job"]=None

    try:
        o["name-of-the-company"]=alllitags[i].find("div",{"class":"companyInfo"}).find("span",{"class":"companyName"}).text
    except:
        o["name-of-the-company"]=None


    try:
        o["rating"]=alllitags[i].find("div",{"class":"companyInfo"}).find("span",{"class":"ratingsDisplay"}).text
    except:
        o["rating"]=None

    try:
        o["salary"]=alllitags[i].find("div",{"class":"salary-snippet-container"}).text
    except:
        o["salary"]=None

    try:
        o["job-details"]=alllitags[i].find("div",{"class":"metadata taxoAttributes-container"}).find("ul").text
    except:
        o["job-details"]=None

    l.append(o)
    o={}


print(l)

Using Scrapingdog for scraping Indeed

Scrapingdog provides a dedicated Indeed Scraping API with which you can scrape Indeed at scale. You won’t even have to parse the data because you will already get data in JSON form.

Scrapingdog provides a generous free pack with 1000 credits. You just have to sign up for that.

Scrapingdog homepage
Scrapingdog Homepage

Once you sign up, you will find an API key on your dashboard. You have to paste that API key in the provided code below.

import requests
import json

url = "https://api.scrapingdog.com/indeed"
api_key = "Paste-your-own-API-key"
job_search_url = "https://www.indeed.com/jobs?q=python&l=New York, NY&vjk=8bf2e735050604df"

# Set up the parameters
params = {"api_key": api_key, "url": job_search_url}
print(params)
# Make the HTTP GET request
response = requests.get(url, params=params)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the JSON content
    json_response = response.json()
    print(json_response)
else:
    print(f"Error: {response.status_code}")
    print(response.text)

You have to send a GET request to https://api.scrapingdog.com/indeed with your API key and the target Indeed URL.

With this script, you will be able to scrape Indeed with a lightning-fast speed that too without getting blocked.

Forget about getting blocked while scraping Indeed

Try out Scrapingdog Indeed Scraper API with thousands of proxy servers and an entire headless Chrome cluster

Conclusion

In this tutorial, we were able to scrape Indeed job postings with Requests and BS4. Of course, you can modify the code a little to extract other details as well.

I have scraped Glassdoor job listings using Python, & LinkedIn Jobs do check them out as well!

You can change the page URL to scrape jobs from the next page. You have to find the change that happens to the URL once you change the page by clicking the number from the bottom of the page. For scraping millions of such postings you can always use Scrapingdog.

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on your social media.

Frequently Asked Questions

Yes, Indeed.com does provide an API to get access to its job data. However, it isn’t economical and may hit the pocket hard. Using 3rd party APIs would get you the same job done with less pricing.

Additional Resources

Here are a few additional resources that you may find helpful during your web scraping journey:

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status