< Back to Blog Overview

Web Scraping Booking.com Hotel Price Data using Python

09-01-2024

Web scraping is a useful tool when you want to gather information from the internet. For those in the hotel industry, knowing the prices of other hotels can be very helpful. This is because, with more hotels & OTAs coming into the market, the competition is rising at a faster pace now!

So, how do you keep track of all these prices?

The answer is by scraping hotel prices. In this blog, we’ll show you how to scrape hotel prices from booking.com using Python.

scraping hotel data from booking.com
How To Scrape Hotel Data from Booking.com

You’ll learn how to get prices from any hotel on booking.com by just entering the check-in/out dates and the hotel’s ID. Also, if you’re a hotel owner and want a ready-made solution to monitor prices, check out the Makcorps Hotel API.

Let’s get started!

Why use Python to Scrape booking.com

Python is the most versatile language and is used extensively with web scraping. Moreover, it has dedicated libraries for scraping the web.

With a large community, you might get your issues solved whenever you are in trouble. If you are new to web scraping with Python, I would recommend you to go through this guide comprehensively made for web scraping with Python.

Requirements for scraping hotel data from booking.com

We need Python 3.x for this tutorial and I am assuming that you have already installed that on your computer. Along with that, you need to install two more libraries which will be used further in this tutorial for web scraping.

  1. Requests will help us to make an HTTP connection with Booking.com.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.

Setup

First, create a folder and then install the libraries mentioned above.

mkdir booking
pip install requests 
pip install beautifulsoup4

Inside this folder create a Python file where will write the code. These are the following data points that we are going to scrape from the target website.

  • Address
  • Name
  • Pricing
  • Rating
  • Room Type
  • Facilities

Letā€™s Scrape Booking.com

Since everything is set letā€™s make a GET request to the target website and see if it works.

import requests
from bs4 import BeautifulSoup

l=list()
o={}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}

target_url = "https://www.booking.com/hotel/us/the-lenox.html?checkin=2022-12-28&checkout=2022-12-29&group_adults=2&group_children=0&no_rooms=1&selected_currency=USD"

resp = requests.get(target_url, headers=headers)

print(resp.status_code)

The code is pretty straightforward and needs no explanation but let me explain you a little. First, we imported two libraries that we downloaded earlier in this tutorial then we declared headers and target URLs.

Finally, we made a GET request to the target URL. Once you print you should see a 200 code otherwise your code is not right.

How to scrape the data points

Since we have already decided which data points we are going to scrape let’s find their HTML location by inspecting chrome.

For this tutorial, we will be using the find() and find_all() methods of BeautifulSoup to find target elements. DOM structure will decide which method will be better for each element.

Extracting hotel name and address

Letā€™s inspect Chrome and find the DOM location of the name as well as the address.

As you can see the hotel name can be found under the h2 tag with class pp-header__titleFor the sake of simplicity letā€™s first create a soup variable with the BeautifulSoup constructor and from that, we will extract all the data points.

soup = BeautifulSoup(resp.text, 'html.parser')

Here BS4 will use an HTML parser to convert a complex HTML document into a complex tree of python objects. Now, letā€™s use the soup variable to extract the name and address.

o["name"]=soup.find("h2",{"class":"pp-header__title"}).text

In a similar manner, we will extract the address.

The address of the property is stored under the span tag with the class name hp_address_subtitle.

o["address"]=soup.find("span",{"class":"hp_address_subtitle"}).text.strip("\n")

Extracting rating and facilities

Once again we will inspect and find the DOM location of the rating and facilities element.

Rating is stored under the div tag with class d10a6220b4. We will use the same soup variable to extract this element. The following code will extract the rating data.

o["rating"]=soup.find("div",{"class":"d10a6220b4"}).text

Extracting facilities is a bit tricky. We will create a list in which we will store all the facilities HTML elements. After that, we will run a for loop to iterate over all the elements and store individual text in the main array.

Letā€™s see how it can be done in two simple steps.

fac=soup.find_all("div",{"class":"important_facility"})

fac variable will hold all the facilities elements. Now, letā€™s extract them one by one.

for i in range(0,len(fac)):
    fac_arr.append(fac[i].text.strip("\n"))

fac_arr array will store all the text values of the elements. We have successfully managed to extract the main facilities.

Extract Price and Room Types

This part is the most tricky part of the complete tutorial. The DOM structure of booking.com is a bit complex and needs thorough study before extracting price and room type information.

Here tbody tag contains all the data. Just below tbody you will find tr tag, this tag holds all the information from the first column.

Then going one step down you will find multiple td tags where information like Room Type, price, etc can be found.

First, letā€™s find all the tr tags.

ids= list()

targetId=list()
try:
    tr = soup.find_all("tr")
except:
    tr = None

One thing that you will notice is that every tr tag has data-block-id attribute. Letā€™s collect all those ids in a list.

for y in range(0,len(tr)):
    try:
        id = tr[y].get('data-block-id')

    except:
        id = None

    if( id is not None):
        ids.append(id)

Now, once you have all the ids rest of the job becomes slightly easy. We will iterate over every data-block-id to extract room pricing and room types from their individual tr blocks.

for i in range(0,len(ids)):
    
    try:
       allData = soup.find("tr",{"data-block-id":ids[i]})
    except:
       k["room"]=None
       k["price"]=None

allData variable will store all the HTML data for a particular data-block-id .

Now, we can move to td tags that can be found inside this tr tag. Letā€™s extract rooms first.

try:
     rooms = allData.find("span",{"class":"hprt-roomtype-icon-link"})
except:
     rooms=None 

Here comes the fun part, when you have more than one option for a particular room type you have to use the same room for the next set of pricing in the loop. Let me explain to you with the picture.

Here we have three pricing for one room type. So, when for loop iterates value of the rooms variable will be None. You can see it by printing it. So, we will use the old value of rooms until we receive a new value. I hope you got my point.

if(rooms is not None):
   last_room = rooms.text.replace("\n","")
try:
   k["room"]=rooms.text.replace("\n","")
except:
   k["room"]=last_room

Here last_room will store the last value of rooms until we receive a new value.

Letā€™s extract the price now.

Price is stored under the div tag with class ā€œbui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper prco-f-font-headingā€. Letā€™s use allData variable to find it and extract the text.

price = allData.find("div",{"class":"bui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper prco-f-font-heading"})

k["price"]=price.text.replace("\n","")

We have finally managed to scrape all the data elements that we were interested in.

Complete Code

You can extract other pieces of information like amenities, reviews, etc. You just have to make a few more changes and you will be able to extract them too. Along with this, you can extract other hotel details by just changing the unique name of the hotel in the URL.

The code will look like this.

import requests
from bs4 import BeautifulSoup

l=list()
g=list()
o={}
k={}
fac=[]
fac_arr=[]
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}

target_url = "https://www.booking.com/hotel/us/the-lenox.html?checkin=2022-12-28&checkout=2022-12-29&group_adults=2&group_children=0&no_rooms=1&selected_currency=USD"

resp = requests.get(target_url, headers=headers)

soup = BeautifulSoup(resp.text, 'html.parser')

o["name"]=soup.find("h2",{"class":"pp-header__title"}).text
o["address"]=soup.find("span",{"class":"hp_address_subtitle"}).text.strip("\n")
o["rating"]=soup.find("div",{"class":"d10a6220b4"}).text

fac=soup.find_all("div",{"class":"important_facility"})
for i in range(0,len(fac)):
    fac_arr.append(fac[i].text.strip("\n"))


ids= list()

targetId=list()
try:
    tr = soup.find_all("tr")
except:
    tr = None

for y in range(0,len(tr)):
    try:
        id = tr[y].get('data-block-id')

    except:
        id = None

    if( id is not None):
        ids.append(id)

print("ids are ",len(ids))


for i in range(0,len(ids)):

    try:
        allData = soup.find("tr",{"data-block-id":ids[i]})
        try:
            rooms = allData.find("span",{"class":"hprt-roomtype-icon-link"})
        except:
            rooms=None


        if(rooms is not None):
            last_room = rooms.text.replace("\n","")
        try:
            k["room"]=rooms.text.replace("\n","")
        except:
            k["room"]=last_room

        price = allData.find("div",{"class":"bui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper prco-f-font-heading"})
        k["price"]=price.text.replace("\n","")


        
        
        
        g.append(k)
        k={}

    except:
        k["room"]=None
        k["price"]=None


l.append(g)
l.append(o)
l.append(fac_arr)
print(l)

The output of this script should look like this.

Advantages of Scraping Booking.com

Lots of travel agencies collect a tremendous amount of data from their competitorā€™s websites. They know if they want to gain an edge in the market they must have access to competitorsā€™ pricing strategies.

Advantages of Scraping Booking.com
Advantages of Scraping Booking.com

To secure an advantage over the niche competitor one has to scrape multiple websites and then aggregate the data. Then finally adjust your prices after comparing with them. Generate discounts or show on the platform how cheap are your prices alongside your competitorā€™s prices.

Since there are more than 200 OTAs in the market it becomes a lot more difficult to scrape and compare. I would advise you to use services like hotel search API to get all the prices of all the hotels in any city around the globe.

Not sure how many requests will be used by Scrapingdog’s API? Talk to our expert from here & get a customized plan as per your business needs!!

Conclusion

Hotel data scraping goes beyond this and this was just an example of how Python can be used for scraping Booking.com for price comparison purposes. You can use Python for scraping other websites like Expedia, Hotels.com, etc.

I have scraped Expedia using Python here, Do check it out too!!

But scraping at scale would not be possible with this process. After some time booking.com will block your IP and your data pipeline will be blocked permanently. Ultimately, you will need to track and monitor prices for hotels when you will be scraping the hotel data.

Additional Resources

Here are a few additional resources that you may find helpful during your web scraping journey:

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status