< Back to Blog Overview

Scrape data from Linkedin using Python and save it in a CSV file

2020-06-13
Scrape data from Linkedin using Python and save it in a CSV file

Procedure

Generally, web scraping is divided into two parts:
  1. Extracting important data by parsing the HTML DOM

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Pandas provide fast, flexible, and expressive data structures
  4. Web Scraper to extract the HTML code of the target URL.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x.
mkdir scraper
pip install beautifulsoup4
pip install requests
pip install pandas
from bs4 import BeautifulSoup
import requests
import pandas as pd

What we are going to scrape

We are going to scrape the “about” page of Google from Linkedin.

Preparing the Food

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. If you are not familiar with the scraping tool, I would urge you to go through its documentation. We will use requests to make an HTTP GET request. Now Since we are scraping a company page so I have set “type” as company and “linkId” as google/about/. LinkId can be found in Linkedin's target URL.
r = requests.get(‘https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY&type=company&linkId=google/about/').text
soup=BeautifulSoup(r,’html.parser’)
l={}
u=list()
try:
   l[“Company”]=soup.find(“h1”,{“class”:”org-top-card-summary__title t-24 t-black truncate”}).text.replace(“\n”,””)
except:
   l[“Company”]=None
allProp = soup.find_all(“dd”,{“class”:”org-page-details__definition-text t-14 t-black — light t-normal”})
try:
 l[“website”]=allProp[0].text.replace(“\n”,””)
except:
 l[“website”]=Nonetry:
 l[“Industry”]=allProp[1].text.replace(“\n”,””)
except:
 l[“Industry”]=Nonetry:
 l[“Address”]=allProp[2].text.replace(“\n”,””)
except:
 l[“Address”]=Nonetry:
 l[“Type”]=allProp[3].text.replace(“\n”,””)
except:
 l[“Type”]=Nonetry:
 l[“Specialties”]=allProp[4].text.replace(“\n”,””)
except:
 l[“Specialties”]=None
try:
 l[“Company Size”]=soup.find(“dd”,{“class”:”org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl”}).text.replace(“\n”,””)
except:
 l[“Company Size”]=None
u.append(l)
df = pd.io.json.json_normalize(u)
df.to_csv(‘linkedin.csv’, index=False, encoding=’utf-8')

Complete Code

from bs4 import BeautifulSoup
import requests
import pandas as pdr = requests.get(‘https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY&type=company&linkId=google/about/').textsoup=BeautifulSoup(r,’html.parser’)u=list()
 l={}try:
 l[“Company”]=soup.find(“h1”,{“class”:”org-top-card-summary__title t-24 t-black truncate”}).text.replace(“\n”,””)
except:
 l[“Company”]=NoneallProp = soup.find_all(“dd”,{“class”:”org-page-details__definition-text t-14 t-black — light t-normal”})try:
 l[“website”]=allProp[0].text.replace(“\n”,””)
except:
 l[“website”]=Nonetry:
 l[“Industry”]=allProp[1].text.replace(“\n”,””)
except:
 l[“Industry”]=Nonetry:
 l[“Company Size”]=soup.find(“dd”,{“class”:”org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl”}).text.replace(“\n”,””)
except:
 l[“Company Size”]=Nonetry:
 l[“Address”]=allProp[2].text.replace(“\n”,””)
except:
 l[“Address”]=Nonetry:
 l[“Type”]=allProp[3].text.replace(“\n”,””)
except:
 l[“Type”]=Nonetry:
 l[“Specialties”]=allProp[4].text.replace(“\n”,””)
except:
 l[“Specialties”]=Noneu.append(l)df = pd.io.json.json_normalize(u)
df.to_csv(‘linkedin.csv’, index=False, encoding=’utf-8')print(df)

Conclusion

In this article, we understood how we can scrape data from Linkedin using proxy scraper Python. As I said earlier you can scrape a Profile too but just read the docs before trying it.

Additional Resources

And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey:
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!