< Back to Blog Overview

Scraping Yelp Data using Python (A Comprehensive Guide)

20-03-2023

In this tutorial, we will scrape Yelp and build our own Yelp scraper using Python. We’re going to harness the power of this programming language to extract valuable insights from Yelp’s rich and extensive database.

Whether you’re a budding data scientist, a curious programmer, or a business analyst seeking novel ways to obtain data, this guide will help you unravel the potential of web scraping Yelp.

From collecting customer reviews to analyzing business ratings, the opportunities are vast. So, let’s embark on this journey, turning unstructured data into meaningful insights, one scrape at a time.

To make things simple, we will use Scrapingdog’s scraping API.

Scrape yelp reviews using python
Scaping Yelp Reviews With Python

Why Scrape Yelp Data?

Yelp is an American company that publishes reviews about businesses. The reviews they collect are crowd-sourced. It is the largest directory on the Internet available. 

Scraping Yelp data & designing a Yelp data scraper will provide you with a large number of data trends and information. Using this data you can either improve your product or you can show it to your other free clients to convert them to your paid client. 

Since Yelp is a business directory it has many businesses listed that can be in your target market. Scraping Yelp data allows you to extract valuable information like business names, contact information, location, and industry to help you create qualified leads a lot faster with a web scraper.

Read More: Web Scraping Yellow Pages Data for Phone Numbers, Email & Address Using Python!!

Requirements For Scraping Yelp Data

Generally, web scraping is divided into two parts:

  1. Fetching data by making an HTTP request
  2. Extracting important data by parsing the HTML DOM

Libraries & Tools

  1. Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Web scraping API extracts the HTML code of the target URL.

Know more: Learn Web Scraping 101 with Python!!

Setup

Our setup is pretty simple. Just create a folder and install BeautifulSoup & requests. For creating a folder and installing libraries, type the below-given commands. I assume that you have already installed Python 3. x (The latest version is 3.9 as of April 2022).

mkdir scraper<br>pip install beautifulsoup4<br>pip install requests

Now, create a file inside that folder by any name you like. I am using scraping.py.

Firstly, you have to sign up for the scrapingdog API. It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. Like this.

from bs4 import BeautifulSoup<br>import requests

Let’s Start Scraping Yelp Reviews for a Random Restaurant

We are going to scrape public data for this restaurant. We will create a Yelp review scraper for that.
We will extract the following information:-

  1. Name of the person
  2. Location of the person
  3. Stars
  4. Review
scraping this restaurant Information from Yelp
Credits Yelp

Let’s Start Scraping Yelp Review Data

Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL to get the raw HTML data. If you are not familiar with the scraping tool, I urge you to review its documentation.

We will scrape Yelp data using the requests library below.

r = requests.get('<a href="https://api.scrapingdog.com/scrape?api_key=5ea541dcacf6581b0b4b4042&amp;url=https://www.yelp.com/biz/sushi-yasaka-new-york%27).text" rel="noreferrer noopener" target="_blank">https://api.scrapingdog.com/scrape?api_key=5ea541dcacf1b0b4b4042&amp;url=https://www.yelp.com/biz/sushi-yasaka-new-york').text</a>

This will provide you with an HTML code of that target URL.

Now, you have to use BeautifulSoup to parse HTML.

soup = BeautifulSoup(r,’html.parser’)

Now, all the reviews are in the form of a list. We have to find all those lists.

allrev = soup.find_all(“li”,{“class”:”lemon — li__373c0__1r9wz margin-b3__373c0__q1DuY padding-b3__373c0__342DA border — bottom__373c0__3qNtD border-color — default__373c0__3-ifU”})

We will run for a loop to reach every reviewer. To extract names, places, stars, and reviews, we must first find the tags where this data is stored. For example “Name” is stored in “lemon — a__373c0__IEZFH link__373c0__1G70M link-color — inherit__373c0__3dzpk link-size — inherit__373c0__1VFlE”. Like this, using chrome developer tools, you can find the rest of the tags.

for i in range(0,len(allrev)): 

try:
                        l["name"]=allrev[i].find("a",{"class":"lemon--a__373c0__IEZFH link__373c0__1G70M link-color--inherit__373c0__3dzpk link-size--inherit__373c0__1VFlE"}).text
 except:
                        l["name"]=None

 try:
                        l["place"]=allrev[i].find("span",{"class":"lemon--span__373c0__3997G text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--bold__373c0__1elNz text-size--small__373c0__3NVWO"}).text
 except:
                        l["place"]=None 

try:
                        l["stars"]=allrev[i].find("div",{"class":"lemon--div__373c0__1mboc i-stars__373c0__1T6rz i-stars--regular-5__373c0__N5JxY border-color--default__373c0__3-ifU overflow--hidden__373c0__2y4YK"}).get('aria-label')
 except:
                        l["stars"]=None 

try:
                        l["review"]=allrev[i].find("span",{"class":"lemon--span__373c0__3997G raw__373c0__3rKqk"}).text
 except:
                        l["review"]=None

u.append(l)
l={}

print({"data":u})

The output of the above code will be: –

{
 “data”: [
 {
 “review”: “If you’re looking for great sushi on Manhattan’s upper west side, head over to Sushi Yakasa ! Best sushi lunch specials, especially for sashimi. I ordered the Miyabi — it included a fresh oyster ! The oyster was delicious, served raw on the half shell. The sashimi was delicious too. The portion size was very good for the area, which tends to be a pricey neighborhood. The restaurant is located on a busy street (west 72nd) &amp; it was packed when I dropped by around lunchtimeStill, they handled my order with ease &amp; had it ready quickly. Streamlined service &amp; highly professional. It’s a popular sushi place for a reason. Every piece of sashimi was perfect. The salmon avocado roll was delicious too. Very high quality for the price. Highly recommend! Update — I’ve ordered from Sushi Yasaka a few times since the pandemic &amp; it’s just as good as it was before. Fresh, and they always get my order correct. I like their takeout system — you can order over the phone (no app required) &amp; they text you when it’s ready. Home delivery is also available &amp; very reliable. One of my favorite restaurants- I’m so glad they’re still in business !”,
 “name”: “Marie S.”,
 “stars”: “5 star rating”,
 “place”: “New York, NY”
 },
 {
 “review”: “My friends recommended for me to try this place for take out as I was around the area. I ordered the Miyabi, all the sushi and sashimi was very fresh and tasty. They also gave an oyster which was a bonus! The price is great for the quality and amount of fish. I was happily full.”,
 “name”: “Lydia C.”,
 “stars”: “5 star rating”,
 “place”: “Brooklyn, Brooklyn, NY”
 },
 {
 “review”: “Best sushi on UWS and their delivery is quicker than any I’ve seen! I ordered their 3 roll lunch special around 1:40pm and by 2, I was thoroughly enjoying my sushi! Granted, I live only a few blocks away but I was BLOWN away by the quick services. I had, spicy yellowtail, jalapeño yellowtail and tuna avocado roll. Great quality of fish for such a reasonable price. $16 for 3 rolls. This has certainly come by go-to place for amazing, fresh sushi on UWS.”,
 “name”: “Ella D.”,
 “stars”: “5 star rating”,
 “place”: “Manhattan, New York, NY”
 },
 ]
}

There you go!

We have the Yelp data ready to manipulate and maybe store somewhere like in MongoDB. But that is out of the scope of this tutorial.

Remember that if you aren’t using Python but other programming languages like Ruby, Nodejs, or PHP. You can easily find HTML parsing libraries to parse the results from Scrapingdog API.

We have other comprehensive guides made on other programming languages.

Using Scrapingdog’s API to Scrape Yelp Data?

Scrapingdog’s API for web scraping can help you extract data from Yelp at scale without getting blocked. You just have to pass the target url and Scrapingdog will create an unbroken data pipeline for you, that too without any blockage.

scrapingdog homepage
Scrapingdog Home Page

Scrapingdog is fast and handles all the hassle of handling proxies and passing custom headers. It offers 1000 free API GET Requests.

Forget about getting blocked while scraping the Web

Try out Scrapingdog Web Scraping API & start scraping unlimited Yelp listings

We hope you enjoyed this tutorial, and we hope to see you soon in Scrapingdog. Happy Scraping!

Frequently Asked Questions

Scrapingdog offers economical web scraping API. Further, you can use it for scraping any data from the internet. The response time is quick, & the accuracy of data is 100%.

Conclusion

In this article, we understood how you can scrape Yelp data using the data scraping tool & BeautifulSoup regardless of the type of website.

Feel free to comment and ask me anything. You can follow us on Twitter and Medium. Thanks for reading, and please hit the like button!

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!

DMCA.com Protection Status