Scraping Zillow is one of the easiest ways to analyze the market of properties in your desired area. According to similarweb Zillow has an estimated user visits of 348.4 Million per month.
Over time this number will increase and more and more people will be registering their properties over it. Hence, scraping Zillow can get you some valuable insights.
Well, then how to scrape Zillow? There are various other methods you can use for Zillow scraping. However, for the sake of this blog, we will be using Python.
Let’s Get Started!!
Why Scrape Zillow Data using Python?
Python has many libraries for web scraping that are easy to use and well-documented. That doesn’t mean that other programming languages have bad documentation or anything else, but Python gives you more flexibility.
You can do countless things with Python, from scraping Google search results to price scraping.
With all this, you get great community support and many forums to solve any issue you might face in your Python journey.
When you are extracting data from the web then starting with Python will help you collect data in no time and it will also boost your confidence especially if you are a beginner.
Some of the best Python forums that I suggest are:
Prerequisites
I hope Python 3.x is already installed on your machine and if it is then first create a folder by any name you like. I am creating a folder by the name zillow. Now, inside this folder create a python file zillow.py and install two libraries requests
and BeautifulSoup
.
mkdir scraper
pip install requests
pip install beautifulsoup4
Let’s Scrape Zillow Data using Python!
- The first step would be to download the raw HTML from the target page using the requests library.
- The second step will be to parse the data using the BeautifulSoup library.
Downloading raw HTML
Our target page will be this and from this page, we are going to extract the price, size, and address of each property.
import requests
from bs4 import BeautifulSoup
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"}
resp = requests.get(target_url, headers=headers)
print(resp.status_code)
ListItem-c11n-8–102–0__sc-13rwu5a-0
. You can find that by inspecting the element. There are almost 42 properties listed on this page. Let’s check where our target data elements are stored.
As you can see the price tag is stored in the class span
tag with attribute data-test
and value property-card-price
.
Similarly, you can see that the property size is located inside a ul
tag with class StyledPropertyCardHomeDetailsList-c11n-8–102–0__sc-1j0som5–0
.
The address is located inside the address tag.
We have all the ingredients to make a great Zillow scraper. Let’s code!
import requests
from bs4 import BeautifulSoup
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}
resp = requests.get(target_url, headers=headers)
print(resp.status_code)
The above code is pretty simple but let me explain it to you step by step.
- First, we have imported the required libraries i.e. requests and BS4.
- Then empty list l and an object obj are declared.
target_url
is declared which holds our target Zillow web page.headers
is declared which will be passed to the requests library. This will help us make our request look more authentic.- Finally, we are checking whether the status code is
200
or not with aprint
statement. If it is200
then we can proceed ahead with our second and final step of parsing.
After running the code I got 200
.
Parsing with BS4
resp=resp.text
soup = BeautifulSoup(resp,'html.parser')
properties = soup.find_all("li",{"class":"ListItem-c11n-8-102-0__sc-13rwu5a-0"})
Here I have created a BeautifulSoup object. Then we will find all the property boxes using the find_all()
method.
for x in range(0,len(properties)):
try:
obj["pricing"]=properties[x].find("span",{'data-test':'property-card-price'}).text
except:
obj["pricing"]=None
try:
obj["size"]=properties[x].find("ul",{"class":"StyledPropertyCardHomeDetailsList-c11n-8–102–0__sc-1j0som5–0"}).find_all('li')[-1].text
except:
obj["size"]=None
try:
obj["address"]=properties[x].find("address").text
except:
obj["address"]=None
l.append(obj)
obj={}
print(l)
for
loop through which we are going to extract information like price, size, and address.
Once the parsing is over we push all the results to a list l. When you run the code you will get these results: You will notice that you only got the result of 9 properties only even though there are 42 properties. Why so?
Well, Zillow can only be scraped through JS rendering with a little bit of scraping. I know this sounds like a lot of work but you can do all this stuff with a simple web scaping API like Scrapingdog.
Once you sign up for the free pack of Scrapingdog you will get 1000 free credits through which you can test the service very easily.
Scraping Zillow with Scrapingdog
Once you sign up you will be redirected to your dashboard where you can find your API Key at the top. API key helps to identify the user. You need this while making the GET request to Scrapingdog.
The code will remain the same as used in the earlier section. Just the target URL will change as it will be replaced with Scrapingdog’s API.
import requests
from bs4 import BeautifulSoup
l=list()
obj={}
target_url = "https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
target_url="http://api.scrapingdog.com/zillow?api_key=Your-API-key&listing=true&url=https://www.zillow.com/homes/for_sale/Brooklyn,-New-York,-NY_rb/"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","Accept-Language":"en-US,en;q=0.9","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding":"gzip, deflate, br","upgrade-insecure-requests":"1"}
resp = requests.get(target_url)
print(resp.status_code)
resp=resp.text
soup = BeautifulSoup(resp,'html.parser')
properties = soup.find_all("li",{"class":"ListItem-c11n-8-102-0__sc-13rwu5a-0"})
for x in range(0,len(properties)):
try:
obj["pricing"]=properties[x].find("span",{'data-test':'property-card-price'}).text
except:
obj["pricing"]=None
try:
obj["size"]=properties[x].find("ul",{"class":"StyledPropertyCardHomeDetailsList-c11n-8–102–0__sc-1j0som5–0"}).find_all('li')[-1].text
except:
obj["size"]=None
try:
obj["address"]=properties[x].find("address").text
except:
obj["address"]=None
l.append(obj)
obj={}
print(l)
Conclusion
In this post, we built a Zillow scraper using Python & learned how to extract real estate data. Also, we saw how Scrapingdog can help scale this process.
We learned the main difference between normal HTTP requests and JS rendering while web scraping Zillow with Python.
I have also created a list below of famous real-estate websites to help you identify which website needs JS rendering and which does not.
Frequently Asked Questions
Scrapingdog has a dedicated Zillow Scraper API for extracting data from Zillow at scale and has a faster response rate compared to other APIs available. Check it out & give it a spin to see if it meets your business needs.
You get 1000 GET requests/month in the free plan. However, in the $40 Lite Plan, we offer 4000 Zillow Credits. So, try the free plan first and upgrade it if it suits your need. Check out the dedicated Zillow Scraper API here!
Yes, it is legal. As long as you are using this data for ethical purposes, you don’t violate any legal policy.