< Back to Blog Overview

How to Scrape data from LinkedIn using Python

13-06-2020

In this post, we are going to scrape data from LinkedIn using Python and Scrapingdog.

We are going to extract the Company Name, Website, Industry, Company Size, Number of employees, Headquarters Address, and Specialties.

Why use Scrapingdog though? This tool will help us to scrape dynamic websites using millions of rotating residential proxies so that we don’t get blocked. This tool can also provides a captcha clearing facility.

web scraping linkedin with python
Scraping LinkedIn with Python

Why use Python & LinkedIn Scraper API to Scrape LinkedIn?

Web Scraping with Python provides you the flexibility while extracting data from Linkedin specifically. Python also has great community support along with that you get a large pool of flexible libraries which helps beginners starting his/her scraping journey.

Since linkedin will block you after scraping 50 profiles or so. It will quickly realize that some script is trying to open the Linkedin profile and it will block your IP. 


But with the Python requests library, you can pass cookies to scrape more than 50 profiles. This will also not last longer but will improve the data delivery. Hence, using LinkedIn Scraper API, which is an API developed by Scrapingdog enables you to seamlessly scrape LinkedIn.

What is LinkedIn Scraper API and What are its Capabilities?

The LinkedIn Scraper API is a dedicated API developed by Scrapingdog for data extraction from LinkedIn. Data such as name, title, current company, location, and more can be extracted with this dedicated LinkedIn API. The LinkedIn Scraper API is easy to use and can be integrated into your own applications.

The data collected can be used to create targeted marketing campaigns or to segment your LinkedIn audience.

Real-world examples of data scraping using the LinkedIn Scraper API

Some examples of data that can be scraped using the LinkedIn Scraper API include:

  • Profile information such as name, title, location, and contact information
  • Experience information such as job title, company, dates of employment, and description
  • Education information such as school, degree, the field of study, and dates of attendance
  • Skills information
  • Endorsements and recommendations

Let’s dig deeper and learn how to do with real case scenarios:

Procedure

Generally, web scraping is divided into two parts:

  • Fetching data by making an HTTP request
  • Extracting important data by parsing the HTML DOM

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Pandas provide fast, flexible, and expressive data structures
  4. Web Scraper to extract the HTML code of the target URL.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type the below-given commands. I am assuming that you have already installed Python 3.x.

mkdir scraper
pip install beautifulsoup4
pip install requests
pip install pandas

Now, create a file inside that folder by any name you like. I am using scraping.py. Firstly, you have to sign up for web scraper. It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file like this.

from bs4 import BeautifulSoup
import requests
import pandas as pd

Scraping LinkedIn Profile of Google

We are going to scrape the “about” page of Google from Linkedin.

Preparing the Food

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. If you are not familiar with the scraping tool, I would urge you to go through its documentation. We will use requests to make an HTTP GET request. Now Since we are scraping a company page so I have set “type” as a company and “linkId” as google/about/. LinkId can be found in Linkedin’s target URL.

r = requests.get(‘<a href="https://api.scrapingdog.com/linkedin/?api_key=5eaa43aae562fc52fe6e4646&amp;type=company&amp;linkId=google/about/%27).text" rel="noreferrer noopener" target="_blank">https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY&amp;type=company&amp;linkId=google/about/').text</a>

This will provide you with an HTML code of those target URLs.Please use your Scrapingdog API key while making the above requests. Now, you have to use BeautifulSoup to parse the HTML.

soup=BeautifulSoup(r,’html.parser’)
l={}
u=list()
bs4 html parser

As you can see in the image that the title of the company is stored in class “org-top-card-summary__title t-24 t-black truncate” with tag h1.So, we’ll use variable soup to extract that text.

try:
   l[“Company”]=soup.find(“h1”,{“class”:”org-top-card-summary__title t-24 t-black truncate”}).text.replace(“\n”,””)
except:
   l[“Company”]=None

I have replaced \n with an empty string. Now, we will focus on extracting website, Industry, Company Size, Headquarters(Address), Type, and Specialties.

replaced empty string

All of the above properties (except Company Size)are stored in class “org-page-details__definition-text t-14 t-black — light t-normal” with tag dd. I will again use variable soup to extract all the properties.

allProp = soup.find_all(“dd”,{“class”:”org-page-details__definition-text t-14 t-black — light t-normal”})

Now, we’ll one by one extract the properties from the allProp list.

try:
 l[“website”]=allProp[0].text.replace(“\n”,””)
except:
 l[“website”]=None

try:
 l[“Industry”]=allProp[1].text.replace(“\n”,””)
except:
 l[“Industry”]=None

try:
 l[“Address”]=allProp[2].text.replace(“\n”,””)
except:
 l[“Address”]=None

try:
 l[“Type”]=allProp[3].text.replace(“\n”,””)
except:
 l[“Type”]=None

try:
 l[“Specialties”]=allProp[4].text.replace(“\n”,””)
except:
 l[“Specialties”]=None

Now, we’ll scrape Company Size.

company size

As, you can see that Company Size is stored in class “org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl” with tag dd.

try:
 l[“Company Size”]=soup.find(“dd”,{“class”:”org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl”}).text.replace(“\n”,””)
except:
 l[“Company Size”]=None

Now, I will push dictionary l to list u. And then we’ll create a data frame of list u using pandas.

u.append(l)
df = pd.io.json.json_normalize(u)

Now, finally saving our data to a CSV file.

df.to_csv(‘linkedin.csv’, index=False, encoding=’utf-8')

We have successfully scraped a Linkedin Company Page. Similarly, you can also scrape a Profile. Please read the docs before scraping a Profile Page.

Complete Code

from bs4 import BeautifulSoup
import requests
import pandas as pd

r = requests.get(‘<a rel="noreferrer noopener" href="https://api.scrapingdog.com/linkedin/?api_key=5eaa43aae562fc52fe6e4646&amp;type=company&amp;linkId=google/about/%27).text" target="_blank">https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY&amp;type=company&amp;linkId=google/about/').text</a>

soup=BeautifulSoup(r,’html.parser’)

u=list()
 l={}

try:
 l[“Company”]=soup.find(“h1”,{“class”:”org-top-card-summary__title t-24 t-black truncate”}).text.replace(“\n”,””)
except:
 l[“Company”]=None

allProp = soup.find_all(“dd”,{“class”:”org-page-details__definition-text t-14 t-black — light t-normal”})

try:
 l[“website”]=allProp[0].text.replace(“\n”,””)
except:
 l[“website”]=None

try:
 l[“Industry”]=allProp[1].text.replace(“\n”,””)
except:
 l[“Industry”]=None

try:
 l[“Company Size”]=soup.find(“dd”,{“class”:”org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl”}).text.replace(“\n”,””)
except:
 l[“Company Size”]=None

try:
 l[“Address”]=allProp[2].text.replace(“\n”,””)
except:
 l[“Address”]=None

try:
 l[“Type”]=allProp[3].text.replace(“\n”,””)
except:
 l[“Type”]=None

try:
 l[“Specialties”]=allProp[4].text.replace(“\n”,””)
except:
 l[“Specialties”]=None

u.append(l)

df = pd.io.json.json_normalize(u)
df.to_csv(‘linkedin.csv’, index=False, encoding=’utf-8')

print(df)

Conclusion

In this article, we understood how we could scrape data from Linkedin using proxy scraper & Python. As I said earlier, you can scrape a Profile, too but just read the docs before trying it. Feel free to comment and ask me anything. You can follow me on Twitter and Medium. Thanks for reading, and please hit the like button!

Frequently Asked Questions

Additional Resources

And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from LinkedIn.

Here are a few additional resources that you may find helpful during your web scraping journey:

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!