How To Scrape Idealista.com with Python

TL;DR

Python guide to scrape Idealista with requests + BeautifulSoup.
Parses listing cards to get title, price, area, description, and property link.
Full code and sample output; adds realistic headers to reduce blocks.
For scale, recommends a scraping API (Scrapingdog) to handle proxies / headers / browsers / retries.

Scraping Idealista can give you massive datasets that you need to drive business growth. Real Estate has become a crucial sector for any country around the globe and every decision is backed by some solid data analysis.

Now if we are talking about data, how should we collect so much data faster? Well, here web scraping can help you collect data.

In this tutorial, we are going to scrape the biggest real estate portal in Portugal Idealista. We are going to use Python for this tutorial & will create our own Idealista scraper.

Collecting all the Ingredients for Scraping

I am assuming that you have already installed Python on your machine. I will be using Python 3.x. With that being installed we will require two more libraries for data extraction.

Requests — This will be used to make the GET request and download the raw HTML from our target web page.
BeautifulSoup — It will be used to parse the data.

First, we need to create the folder where we will keep our script.

1mkdir coding

Inside this folder, you can create a file by any name you like. I am going to use idealista.py in this case. Finally, we are going to install the libraries mentioned above using pip.

1pip install requests
2pip install beautifulsoup4

This web scraping task is mainly divided into two parts, one is to download the raw HTML from the target web page using requests and the other part is to parse the data using BS4.

What are we going to scrape?

For this tutorial, we are going to scrape this page from Idealista and from the same page, we are going to extract these data points:

I have decided to scrape the following data points:

Title of the property
Price of the property
Property Description
Dedicated web link of the property.

As discussed earlier, we will first download the raw HTML data using requests and then parse the data using BS4.

Downloading the raw HTML

Using requests we will make a simple GET request to download the raw HTML.

1import requests
2from bs4 import BeautifulSoup
3 
4 
5l=list()
6o={}
7 
8target_url = "https://www.idealista.com/venta-viviendas/torrelavega/inmobiliaria-barreda/"
9headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}
10 
11resp = requests.get(target_url, headers=headers)
12 
13print(resp.status_code)

I first imported all the required libraries and then declared an empty list l and an object o.

Then I declared the target URL and headers. We are passing headers to the request because idealista.com loves throwing captchas. This will help us make this request look more authentic.

Finally, we are making a GET request using requests.

After running the code if you get a 200 status code then we can proceed with the parsing process. Let’s run the code and check the status.

I know most of you will not get a 200 , which is fine. I will share a solution at the end of this article that will help you scrape Idealista at scale without getting blocked.

Parsing using BeautifulSoup

Before we write a single line of code for parsing the data we have to identify the location of each data element inside the HTML DOM.

Identifying the location of each element

Each property listing is located inside the div tag with the class item-info-container.

Identifying where the title tag is stored

The property title is stored inside a tag with a class item-link.

Identifying where the property price is stored

The property price is stored inside the span tag with the class item-price.

Identifying where the property description is in HTML

The property description is stored inside the div tag with the class item-description.

Identifying the property link in HTML

Property link is stored inside a tag with class item-link.

We are now completely ready to parse this data.

1soup = BeautifulSoup(resp.text, 'html.parser')
2 
3allProperties = soup.find_all("div",{"class":"item-info-container"})
4 
5for i in range(0,len(allProperties)):
6    o["title"]=allProperties[i].find("a",{"class":"item-link"}).text.strip("\n")
7    o["price"]=allProperties[i].find("span",{"class":"item-price"}).text.strip("\n")
8    o["area-size"]=allProperties[i].find("div",{"class":"item-detail-char"}).text.strip("\n")
9    o["description"]=allProperties[i].find("div",{"class":"item-description"}).text.strip("\n")
10    o["property-link"]="https://www.idealista.com"+allProperties[i].find("a",{"class":"item-link"}).get('href')
11    l.append(o)
12    o={}
13 
14print(l)

Using find_all() method of BeautifulSoup we have created a list allProperties of all the listed properties.

Then using a for loop we reach each element inside the allProperties list. Using find() method we are finding property data and storing it inside the object o and finally pushing the data to a list l.

Once we run the code we get this.

We were able to scrape and extract the data from Idealista.

Complete Code

You can make a few more changes to extract a little more information like the number of properties, map, etc. But the current code will look like this.

1import requests
2from bs4 import BeautifulSoup
3 
4 
5 
6 
7l=list()
8o={}
9 
10target_url = "https://www.idealista.com/venta-viviendas/torrelavega/inmobiliaria-barreda/"
11headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}
12 
13 
14resp = requests.get(target_url, headers=headers)
15print(resp.status_code)
16 
17soup = BeautifulSoup(resp.text, 'html.parser')
18 
19allProperties = soup.find_all("div",{"class":"item-info-container"})
20 
21for i in range(0,len(allProperties)):
22    o["title"]=allProperties[i].find("a",{"class":"item-link"}).text.strip("\n")
23    o["price"]=allProperties[i].find("span",{"class":"item-price"}).text.strip("\n")
24    o["area-size"]=allProperties[i].find("div",{"class":"item-detail-char"}).text.strip("\n")
25    o["description"]=allProperties[i].find("div",{"class":"item-description"}).text.strip("\n")
26    o["property-link"]="https://www.idealista.com"+allProperties[i].find("a",{"class":"item-link"}).get('href')
27    l.append(o)
28    o={}
29 
30 
31print(l)

Limitations of this approach

This approach will only work for scraping a few pages because Idealista is a bot sensitive website and it will block your IP in no time. To avoid this situation and create a scraper that can scrape millions of pages from Idealista you are advised to use a web scraping API such as Scrapingdog. Scrapingdog will handle all the hassle of proxies, headers, browsers and retries. You can focus on data collection rather than on useless tasks of proxy and browser management.

Using Scrapingdog for scraping Idealista

So, we have seen how you can scrape Idealista using Python. But to be very honest, idealista.com is a well-protected site and you cannot extract data at scale using only Python. In fact, after 10 or 20 odd requests Idealista will detect scraping and ultimately will block your scrapers.

After that, you will continuously get 403 errors on every web request you make. Here Scrapingdog’s web scraping API can help you scrape Idealista very efficiently with its large pool of proxies to overcome any challenges such as IP blocking.

Scrapingdog provides free 1000 API calls for all new users and in that pack, you can use all the premium features. First, you need to sign up to get your private API key.

You can find your API key at the top of the dashboard. You just have to make a few changes to the above code and Scrapingdog will be able to handle the rest of the things. You don’t need Selenium or any other web driver to scrape it. You just have to use the requests library to make a GET request to the Scrapingdog API.

1import requests
2from bs4 import BeautifulSoup
3 
4 
5 
6 
7l=list()
8o={}
9 
10target_url = "https://www.idealista.com/venta-viviendas/torrelavega/inmobiliaria-barreda/"
11headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}
12 
13resp = requests.get("https://api.scrapingdog.com/scrape?api_key=Your-API-Key&url={}&dynamic=false".format(target_url))
14 
15print(resp.status_code)
16soup = BeautifulSoup(resp.text, 'html.parser')
17 
18allProperties = soup.find_all("div",{"class":"item-info-container"})
19 
20for i in range(0,len(allProperties)):
21    o["title"]=allProperties[i].find("a",{"class":"item-link"}).text.strip("\n")
22    o["price"]=allProperties[i].find("span",{"class":"item-price"}).text.strip("\n")
23    o["area-size"]=allProperties[i].find("div",{"class":"item-detail-char"}).text.strip("\n")
24    o["description"]=allProperties[i].find("div",{"class":"item-description"}).text.strip("\n")
25    o["property-link"]="https://www.idealista.com"+allProperties[i].find("a",{"class":"item-link"}).get('href')
26    l.append(o)
27    o={}
28 
29 
30print(l)

As you can see the code is very similar to our earlier approach. Integrating Scrapingdog with your existing solution is very easy plus it fires up your data collection cylinders🔥.

Conclusion

In this blog, we understood how you can use Python to scraper Idealista, which is data rich real-estate website that needs no introduction. Further, we saw how Idealista can block your scrapers and to overcome you can use Scrapingdog’s web scraping API.

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on social media.

Additional Resources

Here are a few additional resources you may find resourceful. We have scraped other real estate websites that are below: –