Web scraping is a useful tool when you want to gather information from the internet. For those in the hotel industry, knowing the prices of other hotels can be very helpful. This is because, with more hotels & OTAs coming into the market, the competition is rising at a faster pace now!
So, how do you keep track of all these prices?
The answer is by scraping hotel prices. In this blog, we’ll show you how to scrape hotel prices from booking.com using Python.
You’ll learn how to get prices from any hotel on booking.com by just entering the check-in/out dates and the hotel’s ID. Also, if you’re a hotel owner and want a ready-made solution to monitor prices, check out the Makcorps Hotel API.
Let’s get started!
Why use Python to Scrape booking.com
Python is the most versatile language and is used extensively with web scraping. Moreover, it has dedicated libraries for scraping the web.
With a large community, you might get your issues solved whenever you are in trouble. If you are new to web scraping with Python, I would recommend you to go through this guide comprehensively made for web scraping with Python.
Requirements for scraping hotel data from booking.com
We need Python 3.x for this tutorial and I am assuming that you have already installed that on your computer. Along with that, you need to install two more libraries which will be used further in this tutorial for web scraping.
- Requests will help us to make an HTTP connection with Booking.com.
- BeautifulSoup will help us to create an HTML tree for smooth data extraction.
Setup
First, create a folder and then install the libraries mentioned above.
mkdir booking pip install requests pip install beautifulsoup4
Add Your Heading Text Here
Inside this folder create a Python file where will write the code. These are the following data points that we are going to scrape from the target website.
- Address
- Name
- Pricing
- Rating
- Room Type
- Facilities
Let’s Scrape Booking.com
import requests from bs4 import BeautifulSoup l=list() o={} headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"} target_url = "https://www.booking.com/hotel/us/the-lenox.html?checkin=2022-12-28&checkout=2022-12-29&group_adults=2&group_children=0&no_rooms=1&selected_currency=USD" resp = requests.get(target_url, headers=headers) print(resp.status_code)
Add Your Heading Text Here
The code is pretty straightforward and needs no explanation but let me explain you a little. First, we imported two libraries that we downloaded earlier in this tutorial then we declared headers and target URLs.
Finally, we made a GET request to the target URL. Once you print you should see a 200 code otherwise your code is not right.
How to scrape the data points
Since we have already decided which data points we are going to scrape let’s find their HTML location by inspecting chrome.
For this tutorial, we will be using the find() and find_all() methods of BeautifulSoup to find target elements. DOM structure will decide which method will be better for each element.
Extracting hotel name and address
Let’s inspect Chrome and find the DOM location of the name as well as the address.
As you can see the hotel name can be found under the h2 tag with class pp-header__title. For the sake of simplicity let’s first create a soup variable with the BeautifulSoup constructor and from that, we will extract all the data points.
soup = BeautifulSoup(resp.text, 'html.parser')
Add Your Heading Text Here
o["address"]=soup.find("span",{"class":"hp_address_subtitle"}).text.strip("\n")
Add Your Heading Text Here
Extracting rating and facilities
Once again we will inspect and find the DOM location of the rating and facilities element.
o["rating"]=soup.find("div",{"class":"d10a6220b4"}).text
Add Your Heading Text Here
fac=soup.find_all("div",{"class":"important_facility"})
Add Your Heading Text Here
for i in range(0,len(fac)): fac_arr.append(fac[i].text.strip("\n"))
Add Your Heading Text Here
Extract Price and Room Types
This part is the most tricky part of the complete tutorial. The DOM structure of booking.com is a bit complex and needs thorough study before extracting price and room type information.
Here tbody tag contains all the data. Just below tbody you will find tr tag, this tag holds all the information from the first column.
First, let’s find all the tr tags.
ids= list() targetId=list() try: tr = soup.find_all("tr") except: tr = None
Add Your Heading Text Here
One thing that you will notice is that every tr tag has data-block-id attribute. Let’s collect all those ids in a list.
for y in range(0,len(tr)): try: id = tr[y].get('data-block-id') except: id = None if( id is not None): ids.append(id)
Add Your Heading Text Here
Now, once you have all the ids rest of the job becomes slightly easy. We will iterate over every data-block-id to extract room pricing and room types from their individual tr blocks.
for i in range(0,len(ids)): try: allData = soup.find("tr",{"data-block-id":ids[i]}) except: k["room"]=None k["price"]=None
Add Your Heading Text Here
Now, we can move to td tags that can be found inside this tr tag. Let’s extract rooms first.
try: rooms = allData.find("span",{"class":"hprt-roomtype-icon-link"}) except: rooms=None
Add Your Heading Text Here
if(rooms is not None): last_room = rooms.text.replace("\n","") try: k["room"]=rooms.text.replace("\n","") except: k["room"]=last_room
Add Your Heading Text Here
Here last_room will store the last value of rooms until we receive a new value.
Let’s extract the price now.
price = allData.find("div",{"class":"bui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper prco-f-font-heading"}) k["price"]=price.text.replace("\n","")
Add Your Heading Text Here
Complete Code
import requests from bs4 import BeautifulSoup l=list() g=list() o={} k={} fac=[] fac_arr=[] headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"} target_url = "https://www.booking.com/hotel/us/the-lenox.html?checkin=2022-12-28&checkout=2022-12-29&group_adults=2&group_children=0&no_rooms=1&selected_currency=USD" resp = requests.get(target_url, headers=headers) soup = BeautifulSoup(resp.text, 'html.parser') o["name"]=soup.find("h2",{"class":"pp-header__title"}).text o["address"]=soup.find("span",{"class":"hp_address_subtitle"}).text.strip("\n") o["rating"]=soup.find("div",{"class":"d10a6220b4"}).text fac=soup.find_all("div",{"class":"important_facility"}) for i in range(0,len(fac)): fac_arr.append(fac[i].text.strip("\n")) ids= list() targetId=list() try: tr = soup.find_all("tr") except: tr = None for y in range(0,len(tr)): try: id = tr[y].get('data-block-id') except: id = None if( id is not None): ids.append(id) print("ids are ",len(ids)) for i in range(0,len(ids)): try: allData = soup.find("tr",{"data-block-id":ids[i]}) try: rooms = allData.find("span",{"class":"hprt-roomtype-icon-link"}) except: rooms=None if(rooms is not None): last_room = rooms.text.replace("\n","") try: k["room"]=rooms.text.replace("\n","") except: k["room"]=last_room price = allData.find("div",{"class":"bui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper prco-f-font-heading"}) k["price"]=price.text.replace("\n","") g.append(k) k={} except: k["room"]=None k["price"]=None l.append(g) l.append(o) l.append(fac_arr) print(l)
Add Your Heading Text Here
Advantages of Scraping Booking.com
Lots of travel agencies collect a tremendous amount of data from their competitor’s websites. They know if they want to gain an edge in the market they must have access to competitors’ pricing strategies.
To secure an advantage over the niche competitor one has to scrape multiple websites and then aggregate the data. Then finally adjust your prices after comparing with them. Generate discounts or show on the platform how cheap are your prices alongside your competitor’s prices.
Since there are more than 200 OTAs in the market it becomes a lot more difficult to scrape and compare. I would advise you to use services like hotel search API to get all the prices of all the hotels in any city around the globe.
Scrapingdog’s API offers efficient, scalable, and reliable hotel price scraping, adept at handling dynamic sites and bypassing CAPTCHAs. Its easy integration, customization, affordable pricing, and comprehensive support make it a superior choice. Check out our pricing plan here.
Not sure how many requests will be used by Scrapingdog’s API? Talk to our expert from here & get a customized plan as per your business needs!!
Conclusion
Hotel data scraping goes beyond this and this was just an example of how Python can be used for scraping Booking.com for price comparison purposes. You can use Python for scraping other websites like Expedia, Hotels.com, etc.
I have scraped Expedia using Python here, Do check it out too!!
But scraping at scale would not be possible with this process. After some time booking.com will block your IP and your data pipeline will be blocked permanently. Ultimately, you will need to track and monitor prices for hotels when you will be scraping the hotel data.
Additional Resources
Here are a few additional resources that you may find helpful during your web scraping journey: