
In this post, we are going to learn scraping data behind authentication with python. Using python we are going to scrape LinkedIn using session. This is a great source for public data for lead generation, sentiment analysis, jobs, etc. We will code a scraper for that. Using that scraper you would be able to scrape person profiles, jobs, company profiles, etc.
Requirements
Generally, web scraping is divided into two parts:
- Fetching data by making an HTTP request
- Extracting important data by parsing the HTML DOM
Libraries & Tools
- Beautiful Soup is a Python library for pulling data out of HTML and XML files.
- Requests allow you to send HTTP requests very easily.
Setup
Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x.
mkdir scraper
pip install beautifulsoup4
pip install requests
Now, create a file inside that folder by any name you like. I am using scraping.py.
Firstly, you have to sign up for a LinkedIn account. Then just import Beautiful Soup & requests in your file. like this.
from bs4 import BeautifulSoup
import requests
We just want to get the HTML of a profile. For this tutorial, I will choose this profile.
Session
We will use a Session
an object within the request to persist the user session. The session is later used to make the requests.
All cookies will then persist within the session for each forthcoming request. Meaning, if we sign in, the session will remember us and use that for all future requests we make.
client = requests.Session()
Preparing the Food
Now, we have all the ingredients in place to build a scraper. So, let’s start cooking. Let’s just open the developer tools, go to the Network
tab and log in so we can catch the URL.

email = "******@*****"
password = "******"
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/checkpoint/lg/login-submit'
Paste your own email and password. LOGIN_URL could be https://www.linkedin.com/checkpoint/lg/login-submit from the developer tools.

The URL is clearly shown to be https://www.linkedin.com/checkpoint/lg/login-submit, so let’s save that. This is where our first request will go.
You will notice from the developers’ tool that login also requires a CSRF token. It also takes other data too but for this tutorial, we’ll consider CSRF only.

Now, the question is how to get that token. The answer to that is very straightforward. We will make an HTTP request to HOMEPAGE_URL and then we will use BeautifulSoup to extract the token out of it.
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find('input', {'name': 'loginCsrfParam'}).get('value')
Now, we have received the CSRF token. The only job left is now to log in and scrape the profile.
Login
We will log in by making a POST request to “LOGIN_URL”
login_information = {
'session_key':email,
'session_password':password,
'loginCsrfParam': csrf,
}
client.post(LOGIN_URL, data=login_information)
Now you are basically done with your log in part. You have made the request to sign in. All other requests you make in the same script will be considered signed in.
Scrape Profile

s = client.get('https://www.linkedin.com/in/rbranson').text
print (s)
Parsing
Now, the actual scraping. This guide won’t cover that. But if you want, you can read my other guide on how to scrape with Beautiful Soup. It’s very easy to just pick up where you left off here.
Conclusion
In this article, we understood how we can scrape data using session & BeautifulSoup regardless of the type of website.
Feel free to comment and ask me anything. You can follow me on Twitter and Medium. Thanks for reading and please hit the like button! 👍
Additional Resources
And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey: