< Back to Blog Overview

Web Scraping Walmart Data using Python for Product Information (Name, Price & Rating)

23-10-2022

Walmart is the leading retailer in the USA & has enormous public product data available. With web scraping Walmart you can have a close look at any price change or stock level change.

If the price is lower then you can prefer to buy it. You can even create a web scraping tool that can show trends in price change over a year or so.

In this article, we will learn how we can web scrape Walmart data for different data points such as name, price, and rating. Will keep a track of any kind of price change it might follow. We’ll use Python for this tutorial.

web scraping Walmart data using python
Scraping Walmart Data

Let’s Begin Web Scraping Walmart with Python

To begin with, we will create a folder and install all the libraries we might need during the course of this tutorial.

For now, we will install two libraries

  1. Requests will help us to make an HTTP connection with Walmart.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.
>> mkdir walmart
>> pip install requests
>> pip install beautifulsoup4

Inside this folder, you can create a python file where we will write our code. We will scrape this Walmart page. Our data of interest will be:

  1. Name
  2. Price
  3. Rating
  4. Product Details
web scraping this walmart page for product price name and rating
Web Scraping this Walmart Page for Product Price, Name & Rating

First of all, we will find the locations of these elements in the HTML code by inspecting them.

Finding location by inspecting HTML code
Finding location by inspecting HTML code

We can see the name is stored under tag h1 with attribute itemprop. Now, let’s see where the price is stored.

inspecting price
Inspecting Price in HTML code

Price is stored under span tag with attribute itemprop whose value is price.

inspecting rating
Inspecting Rating in HTML code

The rating is stored under span tag with class rating-number.

Inspecting div elements for product detail
Inspecting div elements for product detail

Product detail is stored inside div tag with class dangerous-html.

Let’s start with making a normal GET request to the target webpage and see what happens.

import requests
from bs4 import BeautifulSoup

target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

resp = requests.get(target_url).text

print(resp)
walmart 6

We got a captcha. Walmart loves throwing captchas when they think the request is coming from a script/crawler and not from a browser. To remove this barrier from our way we can simply send some metadata/headers which will make Walmart consider our request as a legit request from a legit browser.

Now, if you are new to web scraping then I would advise you to read more about headers and their importance in Python. For now, we will use seven different headers to bypass Walmart’s security wall.

  1. Accept
  2. Accept-Encoding
  3. Accept-Language
  4. User-Agent
  5. Referer
  6. Host
  7. Connection
import requests
from bs4 import BeautifulSoup

ac="text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"
headers={"Referer":"https://www.google.com","Connection":"Keep-Alive","Accept-Language":"en-US,en;q=0.9","Accept-Encoding":"gzip, deflate, br","Accept":ac,"User-Agent":"Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13G36 Safari/601.1"}

resp = requests.get(target_url, headers=headers)
print(resp.text)
walmart 7

One thing you might have noticed is Walmart sends a 200 status code even when it returns a captcha. So, to tackle this problem you can use if/else statement.

if("Robot or human" in resp.text):
    print(True)
else:
    print(False)

If it is True then Walmart has thrown a captcha otherwise our request was successful. Nest step is to extract our data of interest.

Here we will use BS4 and will scrape every value step by step. We have already determined the exact location of each of these data elements.

soup = BeautifulSoup(resp.text,'html.parser')

l=[]
obj={}

try:
    obj["price"] = soup.find("span",{"itemprop":"price"}).text.replace("Now ","")
except:
    obj["price"]=None

Here we have extracted the price and then replace “Now ” (garbage string) with an empty string. You can use try/except statements to catch any errors.

Now similarly, let’s get the name and the rating of the product. We will come to the product description later.

try:
    obj["name"] = soup.find("h1",{"itemprop":"name"}).text
except:
    obj["name"]=None

try:
    obj["rating"] = soup.find("span",{"class":"rating-number"}).text.replace("(","").replace(")","")
except:
    obj["rating"]=None

l.append(obj)

print(l)
walmart 9

We got all the data except “product detail”. Once you scroll down to your HTML data returned by your python script you will nowhere find this dangerous-html class. 

The reason behind this is the framework used by the Walmart website. It uses the Nextjs framework which sends JSON data once the page has been rendered completely. So, when the socket connection was broken product description part of the website was not loaded.

But the solution to this problem is very easy and it can be scraped in just two steps. Every Nextjs-backed website has a script tag with id as __NEXT_DATA__.

walmart 10

This script will return all the JSON data that we need. Since this is done through Javascript we could not have scraped it with a simple HTTP GET request. So, first of all, you have to find it using BS4 and then load it using the JSON library.

import json

nextTag = soup.find("script",{"id":"__NEXT_DATA__"})
jsonData = json.loads(nextTag.text)

print(jsonData)
walmart 11

This is a huge JSON data which might be a little intimidating. You can use tools like JSON viewer to figure out the exact location of your desired object.

try:
    obj["detail"] = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']
except:
    obj["detail"]=None
walmart 12

We got all the data we were hunting for. By the way, this huge JSON data also contains the data we scraped earlier. You just have to figure out the exact object where it is stored. I leave that part to you.

If you want to learn more about headers, requests, and other libraries of python then I would advise you to read this web scraping with python tutorial.

Complete Code

import requests
from bs4 import BeautifulSoup

ac="text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"

target_url="https://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173"

headers={"Referer":"https://www.google.com","Connection":"Keep-Alive","Accept-Language":"en-US,en;q=0.9","Accept-Encoding":"gzip, deflate, br","Accept":ac,"User-Agent":"Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13G36 Safari/601.1"}

resp = requests.get(target_url)
# print(resp.text)
# if("Robot or human" in resp.text):
#     print(True)
# else:
#     print(False)

soup = BeautifulSoup(resp.text,'html.parser')
l=[]
obj={}

try:
    obj["price"] = soup.find("span",{"itemprop":"price"}).text.replace("Now ","")
except:
    obj["price"]=None

try:
    obj["name"] = soup.find("h1",{"itemprop":"name"}).text
except:
    obj["name"]=None

try:
    obj["rating"] = soup.find("span",{"class":"rating-number"}).text.replace("(","").replace(")","")
except:
    obj["rating"]=None

import json

nextTag = soup.find("script",{"id":"__NEXT_DATA__"})
jsonData = json.loads(nextTag.text)

Detail = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']

try:
    obj["detail"] = Detail
except:
    obj["detail"]=None

l.append(obj)
print(l)

How to Web Scrape Walmart without Code using Scrapingdog

Walmart is in business for a very long time and they know people use scraping techniques to crawl over their websites. They have an architecture that can determine if a request is coming from a bot or a real browser.

Along with that if you want to scrape millions of pages then your IP will be blocked by Walmart. To avoid this you need a rotation of IPs and headers. Scrapingdog can provide you with all of these features.

Scrapingdog is a web scraping tool that can help you create a seamless data pipeline in no time. You can directly start by signing up and making a test call directly from your dashboard.

Let’s go step by step to understand how you can use Scrapingdog to scrape Walmart without getting BLOCKED. Oh! I almost forgot to tell you that for new users first 1000 calls are absolutely free.

First, you have to sign up!

scrapingdog home page to sign up
Scrapingdog Home Page – Sign up from here

Once you are on your dashboard you will have two options.

  1. Either just paste the target URL in the tool and press the “Scrape” button.
  2. Use API URL in POSTMAN or browser or a script to make a GET request.

The first one is the fastest one. So, let’s do that.

Using Scrapingdog's dashboard to extract data from Walmart
Using Scrapingdog’s dashboard to extract data from Walmart

Once you press the Scrape button you will get the complete HTML data from Walmart. You can even set locations if you really want to change that. But in this case, that was not required.

The second option was through a script. You can use the below-provided code to scrape Walmart without being blocked. You won’t have to even pass any headers to scrape it.

import requests
from bs4 import BeautifulSoup

target_url="https://api.scrapingdog.com/scrape?dynamic=false&url=http://www.walmart.com/ip/SAMSUNG-58-Class-4K-Crystal-UHD-2160P-LED-Smart-TV-with-HDR-UN58TU7000/820835173&api_key=YOUR-API-KEY"

resp = requests.get(target_url)
# print(resp.text)
# if("Robot or human" in resp.text):
#     print(True)
# else:
#     print(False)

soup = BeautifulSoup(resp.text,'html.parser')

l=[]
obj={}

try:
    obj["price"] = soup.find("span",{"itemprop":"price"}).text.replace("Now ","")
except:
    obj["price"]=None

try:
    obj["name"] = soup.find("h1",{"itemprop":"name"}).text
except:
    obj["name"]=None

try:
    obj["rating"] = soup.find("span",{"class":"rating-number"}).text.replace("(","").replace(")","")
except:
    obj["rating"]=None

import json

nextTag = soup.find("script",{"id":"__NEXT_DATA__"})
jsonData = json.loads(nextTag.text)

Detail = jsonData['props']['pageProps']['initialData']['data']['product']['shortDescription']
try:
    obj["detail"] = Detail
except:
    obj["detail"]=None

l.append(obj)
print(l)

Do not forget to replace YOUR-API-KEY with your own API key. You can find your key on your dashboard. We have used &dynamic=false parameter to make a normal HTTP request rather than rendering the JS. This will just cost 1 API credit. You can read more about it here.

But if you want to load the JS part of the website as well then remove the &dynamic=false param because Scrapingdog by default renders JS through a real Chrome browser.

We just removed the headers and replaced the target URL with Scrapingdog’s API URL. Other than that rest of the code remains the same.

Scrapingdog is not just restricted to Walmart.com, using this tool you can scrape any website.

Conclusion

As we know Python is too great when it comes to web scraping. We just used two basic libraries to scrape Walmart and print results in JSON. But this process has certain limits and as we discussed above, Walmart will block you if you do not rotate proxies or change headers timely.

If you want to scrape thousands and millions of pages then using Scrapingdog will be the best approach.

I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on social media.

Frequently Asked Questions

Additional Resources

Manthan Koolwal

My name is Manthan Koolwal and I am the CEO of scrapingdog.com. I love creating scraper and seamless data pipelines.
Scrapingdog Logo

Try Scrapingdog for Free!

Free 1000 API calls of testing.

No credit card required!