< Back to Blog Overview

Scrape Zillow Real Estate Data using Python

17-08-2023

Scraping Zillow is one of the easiest ways to analyze the market of properties in your desired area. According to similarweb Zillow has an estimated user visits of 348.4 Million per month.

Over time this number is definitely going to increase and more and more people will be registering their properties over it. Hence, scraping Zillow can get you some valuable insights.

Well, then how to scrape Zillow? There are various other methods you can use for Zillow scraping. However, for the sake of this blog, we will be using Python.

Let’s Get Started!!

how to scrape zillow using python
How to Scrape Zillow

Why Scrape Zillow Data using Python?

Python has many libraries for web scraping that are easy to use and well-documented. That doesn’t mean that other programming languages have bad documentation or anything else, but Python gives you more flexibility.

From scraping Google search results to price scraping, you can do countless things with Python.

With all this, you get great community support and tons of forums to solve any issue you might face in your Python journey.

When you are extracting data from the web then starting with Python will help you collect data in no time and it will also boost your confidence especially if you are a beginner.

Some of the best Python forums that I suggest are:

Let’s Start Scraping Zillow Data using Python!

Normal HTTP GET request

Our target page will be this and through this, we are going to extract the price, size, and address.

>> mkdir scraper
>> pip install requests
>> pip install beautifulsoup4

Here we have created a folder and then installed all the required libraries.

import requests
from bs4 import BeautifulSoup

target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"}

resp =  requests.get(target_url, headers=headers)

print(resp.status_code)

All of these properties are part of a list that has a class name StyledPropertyCardDataWrapper-c11n-8–69–2__sc-1omp4c3–0 KzAaq property-card-data. You can find that by inspecting the element.

There are almost 40 listed properties from Zillow here on this page. We will use BS4 to extract our target data. Let’s check where our target elements are stored.

checking target elements in zillow real estate
Checking Target Elements in Zillow.com

As you can see the price tag is stored in the class StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 kJFQQX. Similarly, you will find that size is stored in StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 bKFUMJ and address is stored in StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 dZxoFm property-card-link.

Now we have all the ingredients to make our scraper ready.

import requests
from bs4 import BeautifulSoup
l=list()
obj={}
target_url = “https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

headers={“User-Agent”:”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",”Accept-Language”:”en-US,en;q=0.9",”Accept”:”text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",”Accept-Encoding”:”gzip, deflate, br”,”upgrade-insecure-requests”:”1"}

resp = requests.get(target_url, headers=headers).text

soup = BeautifulSoup(resp,’html.parser’)

properties = soup.find_all(“div”,{“class”:”StyledPropertyCardDataWrapper-c11n-8–69–2__sc-1omp4c3–0 KzAaq property-card-data”})

for x in range(0,len(properties)):
  try:
    obj[“pricing”]=properties[x].find(“div”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 kJFQQX”}).text
  except:
    obj[“pricing”]=None
  try:
    obj[“size”]=properties[x].find(“div”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 bKFUMJ”}).text
  except:
    obj[“size”]=None
  try:
    obj[“address”]=properties[x].find(“a”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 dZxoFm property-card-link”}).text
  except:
    obj[“address”]=None
  l.append(obj)
  obj={}
print(l)

We have also declared headers like User-Agent, Accept-Encoding, Accept, Accept-Language, and upgrade-insecure-requests to act like a normal browser while hitting Zillow. And this would be the case with any good real estate website.

As I said before they will ultimately identify some suspicious activity and will block you in no time.

Read more: Tips for Web Scraping to Avoid Getting Blocked

To web scrape Zillow at the scale I would suggest you use Scrapingdog’s Web Scraper API which will help scrape property information from Zillow at scale without wasting time on captchas and other data blocks.

We ran a for loop to reach every property stored in our Zillow properties list. Then we use the find function of BS4 to find our target elements.

After finding it we store it in an object and finally push it to a list.

Once you print it you will get this result.

You will notice that you only got 9 results out of 40. Why so?

The answer is that Zillow web scraping can only be done with JS rendering. We will get to that in a while but before that, we will scrape Zillow by changing page numbers.

Zillow just adds a path like this — https://www.zillow.com/brooklyn-new-york-ny/2_p/. So, we just need to run another for loop to iterate over all the properties on different pages.

As you can see there are 5610 listings and on each page, you have 40 properties so according to that we can say there are 140 pages in total. But for learning purposes, we are just going to run our loop for ten pages.

import requests
from bs4 import BeautifulSoup
l=list()
obj={}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}

for page in range(1,11):
    
    resp =  requests.get("https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/{}_p/".format(page), headers=headers).text

    soup = BeautifulSoup(resp,'html.parser')
    
    properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
    for x in range(0,len(properties)):
            try:
                obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
            except:
                obj["pricing"]=None
            try:
                obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
            except:
                obj["size"]=None
            try:
                obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
            except:
                obj["address"]=None
            l.append(obj)
            obj={}
print(l)

We created a for loop to change our URL every time our scraper is done with the last page. This helps us to iterate over all the pages smoothly.

JS rendering

In this section, we are going to scrape Zillow data with JS rendering. We are going to load the website in a browser and then extract the data we need.

We are doing this because Zillow takes multiple API calls to load the website. We will use the Selenium web driver to implement this task. Let us install it first.

>> pip install selenium

Now, import all the libraries inside your file and code with me step by step.

from bs4 import BeautifulSoup
from selenium import webdriver
import time

Now, to use selenium you need a chrome driver. You can install that from here. Install the same version as your Chrome browser. I am using 105 so I have installed 105.

Now, we will declare the path where our chrome driver is located.

PATH = ‘C:\Program Files (x86)\chromedriver.exe’
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

target_url variable stores the website link which is going to be used for data scraping.

driver=webdriver.Chrome(PATH)
driver.get(target_url)

Here, webdriver is asked to use a chrome driver which is located at path PATH. Then .get() function is used to open the target URL in the chrome browser.

property page on zillow
Property Page on Zillow

Since we have to scroll down to load the website completely we are going to find HTML elements on this page.

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

Here we have used .send_keys() to simulate a PAGE_DOWN key press. It is normally used to scroll down the page.

time.sleep(5)
resp = driver.page_source
driver.close()

Then we are going to wait for about 5 seconds to load the website completely. After that extract the page source code using .page_source method of the selenium driver.

Then finally close the driver. If you don’t close it then it will consume more CPU resources.

soup=BeautifulSoup(resp,'html.parser')
properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
for x in range(0,len(properties)):
        try:
            obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
        except:
            obj["pricing"]=None
        try:
            obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
        except:
            obj["size"]=None
        try:
            obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
        except:
            obj["address"]=None
        l.append(obj)
        obj={}

print(l)

Then we used the same BS4 code from our last section to extract the data we need.

You will get all the property data listed on our target page. As you can see the difference between a normal HTTP request and JS rendering. JS rendering helps us in loading the complete website.

Advantages of JS rendering

  • Loads the complete website before scraping.
  • Fewer chances of getting caught by any bot-detection technology available on the website.

Disadvantages of JS rendering

  • It is a time-consuming process. Some websites might even take a minute to load.
  • Consumes a lot of CPU resources. It would be best if you had a large infrastructure to scrape websites at scale with JS rendering.

Complete Code

from bs4 import BeautifulSoup
from selenium import webdriver
import time
PATH = 'C:\Program Files (x86)\chromedriver.exe'
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
driver=webdriver.Chrome(PATH)
driver.get(target_url)
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(5)
resp = driver.page_source
driver.close()
soup=BeautifulSoup(resp,'html.parser')
properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
for x in range(0,len(properties)):
        try:
            obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
        except:
            obj["pricing"]=None
        try:
            obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
        except:
            obj["size"]=None
        try:
            obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
        except:
            obj["address"]=None
        l.append(obj)
        obj={}
print(l)

How to use Scrapingdog for scraping Zillow?

As discussed above, Zillow loves to throw captchas like anything while extracting data from it.

In order to avoid this situation need a Zillow data Scraper that can handle proxies and headless browsers all at once. Here we are using Scrapingdog as the external web scraper to extract data from Zillow without getting blocked.

We will signup for a free pack. The free pack will have 1000 API credits which are enough for testing.

scrapingdog homepage image
Scrapingdog HomePage

Once you signup you will be redirected to your dashboard where you can find your API Key at the top. API key helps to identify the user. You need this while making the GET request to Scrapingdog.

Advantages of Using Scrapingdog:

  • You don’t need to install selenium or any external web driver to load the website.
  • No proxy management.
  • No other external server.

Everything will be managed by Scrapingdog all you need to do is a simple GET request to the API. For making the GET request we will use the requests library of Python. Let’s see what it looks like.

from bs4 import BeautifulSoup
import requests
# from selenium import webdriver
# import time
# PATH = 'C:\Program Files (x86)\chromedriver.exe'
l=list()
obj={}
target_url = "https://api.scrapingdog.com/scrape?api_key=Your-API-Key&url=https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/&dynamic=false"

resp=requests.get(target_url)
soup=BeautifulSoup(resp.text,'html.parser')
properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
for x in range(0,len(properties)):
        try:
            obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
        except:
            obj["pricing"]=None
        try:
            obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
        except:
            obj["size"]=None
        try:
            obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
        except:
            obj["address"]=None
        l.append(obj)
        obj={}
print(l)

We have removed Selenium because we no longer need that. Do not forget to replace “Your-API-KEY” section with your own API key.

You can find your key on your dashboard as shown below.

API key in Scrapingdog Dashboard
API key in Scrapingdog Dashboard

This code will provide you with an unbreakable data stream. Apart from this the rest of the code will remain the same.

Just like this Scrapingdog can be used for scraping any website without getting BLOCKED.

Note: – Also, recently we have made a dedicated Zillow Scraper API, that lets you extract parsed Zillow data.

Forget about getting blocked while scraping Zillow

Try out Scrapingdog’s Zillow Scraper API & Scrape Unlimited Zillow Listings

Conclusion

In this post, we built a Zillow scraper using Python & learned how to extract real estate data. Also, we saw how Scrapingdog can help scale this process.

We learned the main difference between normal HTTP requests and JS rendering while web scraping Zillow with Python.

I have also created a list below of famous real-estate websites to help you identify which website needs JS rendering and which does not.

do you really need js rendering for real estate websites
Need for JS rendering for real estate websites

I hope you liked this post. If you liked it please share it on social media platforms.

If you think I have left some topics then please do let me know.

Frequently Asked Questions

You get 1000 GET requests/month in the free plan. However, in the $30 Lite Plan, we offer 4000 Zillow Credits. So, try the free plan first and upgrade it if it suits your need. Check out the dedicated Zillow Scraper API here!

Yes, it is legal. As long as you are using this data for ethical purposes, you don’t violate any legal policy.

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status