< Back to Blog Overview

Scrape Zillow using Python (Step by Step Guide)

05-10-2022

Are you looking to scrape Zillow?

Or if you are looking to invest in real estate or want to analyze how pricing is working in your area then web scraping Zillow can work wonders for you.

Zillow is a data-rich website and scraping it can provide you with more information than its competitors like IdealistaRealtor, etc.

how to scrape zillow using python
How to Scrape Zillow

In this post, we are going to web scrape Zillow data using Python. We will be using Requests and BS4 (beautifulsoup) libraries from Python.

  • Requests library will help us to make HTTP requests.
  • BS4 i.e. BeautifulSoup4 will help us to parse target data from raw HTML.

Why scrape Zillow data using Python?

Python has many libraries that are easy to use and well documented. That doesn’t mean that other programming languages have bad documentation or anything else, but Python gives you more flexibility.

Know more: The Best Programming Languages to Web Scrape

From scraping google search results to price scrape for business needs, you can do countless things with Python.

With all this, you get great community support and tons of forums to solve any issue you might face in your python journey.

When you are extracting data from the web then starting with python will help you collect data in no time and it will also boost your confidence especially if you are a beginner.

Checkout this beginner friendly tutorial on web scraping with Python!!

Some of the best python forums that I suggest are:

Let’s Web Scrape Zillow Data from Python!

Normal HTTP GET request

Our target page will be this and through this we are going to extract the price, size, and address.

>> mkdir scraper
>> pip install requests
>> pip install beautifulsoup4

Here we have created a folder and then installed all the required libraries.

import requests
from bs4 import BeautifulSoup

target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"}

resp =  requests.get(target_url, headers=headers)

print(resp.status_code)

All of these properties are part of a list which has a class name StyledPropertyCardDataWrapper-c11n-8–69–2__sc-1omp4c3–0 KzAaq property-card-data. You can find that by inspecting the element.

There are almost 40 listed properties from Zillow here on this page. We will use BS4 to extract our target data. Let’s check where our target elements are stored.

checking target elements in zillow real estate
Checking Target Elements in Zillow.com

As you can see the price tag is stored in class StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 kJFQQX. Similarly you will find that size is stored in StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 bKFUMJ and address is stored in StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 dZxoFm property-card-link.

Now we have all the ingredients to make our scraper ready.

import requests
from bs4 import BeautifulSoup
l=list()
obj={}
target_url = “https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

headers={“User-Agent”:”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",”Accept-Language”:”en-US,en;q=0.9",”Accept”:”text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",”Accept-Encoding”:”gzip, deflate, br”,”upgrade-insecure-requests”:”1"}

resp = requests.get(target_url, headers=headers).text

soup = BeautifulSoup(resp,’html.parser’)

properties = soup.find_all(“div”,{“class”:”StyledPropertyCardDataWrapper-c11n-8–69–2__sc-1omp4c3–0 KzAaq property-card-data”})

for x in range(0,len(properties)):
  try:
    obj[“pricing”]=properties[x].find(“div”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 kJFQQX”}).text
  except:
    obj[“pricing”]=None
  try:
    obj[“size”]=properties[x].find(“div”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 bKFUMJ”}).text
  except:
    obj[“size”]=None
  try:
    obj[“address”]=properties[x].find(“a”,{“class”:”StyledPropertyCardDataArea-c11n-8–69–2__sc-yipmu-0 dZxoFm property-card-link”}).text
  except:
    obj[“address”]=None
  l.append(obj)
  obj={}
print(l)

We have also declared some headers like User-Agent, Accept-Encoding, Accept, Accept-Language, and upgrade-insecure-requests to act like a normal browser while hitting the Zillow real estate website.

As I said before they will ultimately identify some suspicious activity and will block you in no time.

Know more: Tips for Web Scraping to Avoid Getting Blocked

To web scrape Zillow at the scale I would suggest you to use Web Scraping API tool that will help you to crawl Zillow at scale without wasting time on captchas and other data blocks.

We ran a for loop to reach every property stored in our Zillow properties list. Then we are using the find function of BS4 to find our target elements.

After finding it we are storing it in an object and finally push it to a list.

Once you print it you will get this result.

zillow image 2

You will notice that you only got 9 results out of 40. Why so?

The answer is Zillow can only be scraped with JS rendering. We will get to that in a while but before that, we will scrape Zillow by changing page numbers.

Zillow just adds a path like this — https://www.zillow.com/brooklyn-new-york-ny/2_p/. So, we just need to run another for loop to iterate over all the properties on different pages.

As you can see there are 5610 listings and on each page, you have 40 properties so according to that we can say there are 140 pages in total. But for learning purposes, we are just going to run our loop for ten pages.

import requests
from bs4 import BeautifulSoup
l=list()
obj={}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}

for page in range(1,11):
    
    resp =  requests.get("https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/{}_p/".format(page), headers=headers).text

    soup = BeautifulSoup(resp,'html.parser')
    
    properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
    for x in range(0,len(properties)):
            try:
                obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
            except:
                obj["pricing"]=None
            try:
                obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
            except:
                obj["size"]=None
            try:
                obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
            except:
                obj["address"]=None
            l.append(obj)
            obj={}
print(l)

We created a for loop to change our URL every time our scraper is done with the last page. This helps us to iterate over all the pages smoothly.

zillow image 3

JS rendering

In this section, we are going to scrape Zillow data with JS rendering. We are going to load the website in a browser and then extract the data we need.

We are doing this because Zillow takes multiple API calls to load the website. We will use the Selenium web driver to implement this task. Let us install it first.

>> pip install selenium

Now, import all the libraries inside your file and code with me step by step.

from bs4 import BeautifulSoup
from selenium import webdriver
import time

Now, to use selenium you need a chrome driver. You can install that from here. Install the same version as your Chrome browser. I am using 105 so I have installed 105.

Now, we will declare the path where our chrome driver is located.

PATH = ‘C:\Program Files (x86)\chromedriver.exe’
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"

target_url variable stores the website link which is going to be used for data scraping.

driver=webdriver.Chrome(PATH)
driver.get(target_url)

Here, webdriver is asked to use a chrome driver which is located at path PATH. Then .get() function is used to open the target URL in the chrome browser.

Zillow website

Since we have to scroll down to load the website completely we are going to find HTML elements on this page.

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)

Here we have used .send_keys() to simulate a PAGE_DOWN key press. It is normally used to scroll down the page.

time.sleep(5)
resp = driver.page_source
driver.close()

Then we are going to wait for about 5 seconds to load the website completely. After that extract the page source code using .page_source method of the selenium driver.

Then finally close the driver. If you don’t close it then it will consume more CPU resources.

soup=BeautifulSoup(resp,'html.parser')
properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
for x in range(0,len(properties)):
        try:
            obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
        except:
            obj["pricing"]=None
        try:
            obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
        except:
            obj["size"]=None
        try:
            obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
        except:
            obj["address"]=None
        l.append(obj)
        obj={}

print(l)

Then we used the same BS4 code from our last section to extract the data we need.

zillow image 5

You will get all the property data listed on our target page. As you can see the difference between a normal HTTP request and JS rendering. JS rendering helps us in loading the complete website.

Advantages of JS rendering

  • Loads the complete website before scraping.
  • Fewer chances of getting caught by any bot-detection technology available on the website.

Disadvantages of JS rendering

  • It is a time taking process. Some websites might even take a minute to load.
  • Consumes a lot of CPU resources. You need a large infrastructure to scrape websites at scale with JS rendering.

Complete Code

from bs4 import BeautifulSoup
from selenium import webdriver
import time
PATH = 'C:\Program Files (x86)\chromedriver.exe'
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
driver=webdriver.Chrome(PATH)
driver.get(target_url)
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(5)
resp = driver.page_source
driver.close()
soup=BeautifulSoup(resp,'html.parser')
properties = soup.find_all("div",{"class":"StyledPropertyCardDataWrapper-c11n-8-69-2__sc-1omp4c3-0 KzAaq property-card-data"})
for x in range(0,len(properties)):
        try:
            obj["pricing"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 kJFQQX"}).text
        except:
            obj["pricing"]=None
        try:
            obj["size"]=properties[x].find("div",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 bKFUMJ"}).text
        except:
            obj["size"]=None
        try:
            obj["address"]=properties[x].find("a",{"class":"StyledPropertyCardDataArea-c11n-8-69-2__sc-yipmu-0 dZxoFm property-card-link"}).text
        except:
            obj["address"]=None
        l.append(obj)
        obj={}
print(l)

Conclusion

We learned the main difference between normal HTTP requests and JS rendering while web scraping Zillow with Python.

I have also created a list below of famous real-estate websites to help you identify which website needs JS rendering and which does not.

do you really need js rendering for real estate websites
Need for JS rendering for real estate websites

I hope you liked this post. If you liked it please share it on social media platforms. If you think I have left some topics then please do let me know.

Frequently Asked Questions

Web scraping which is essentially the process of extracting a bulk data from a web page using different methods one of which is using a web scraping API. Scraping Zillow is harvesting data about the properties in a particular area or location.

Of course, Scrapingdog can scrape data from real estate websites including Zillow.

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the CEO of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!