Playwright Web Scraping with Node.js: Complete Guide (2026)

TL;DR

Playwright + Node.js setup, launch, and first run.
Scrape flow: load page → grab HTML → parse with Cheerio (IMDb demo).
Techniques: waitForSelector, infinite scroll loop, type / click flows.
Use proxies for scale; Playwright is powerful, but for hands-off scaling the article recommends Scrapingdog.

Playwright has quickly become the go-to browser automation library for web scraping in Node.js. Unlike simple HTTP clients, it runs a real browser — meaning it handles JavaScript-rendered pages, dynamic content, infinite scroll, and complex user interactions out of the box.

In this guide, you will learn how to scrape websites using Playwright and Node.js from scratch. We’ll cover everything from basic setup to advanced techniques like waiting for elements, handling infinite scroll, form interactions, stealth mode, and using proxies at scale.

By the end, you’ll have a complete scraping toolkit ready to use on any website.

What is Playwright?

Playwright is an open-source browser automation library developed by Microsoft. It allows you to control real browsers like Chromium, Firefox, and WebKit programmatically using Node.js, Python, Java, or .NET.

Unlike traditional HTTP scraping libraries, Playwright renders pages exactly like a real user’s browser does. This makes it capable of handling JavaScript-heavy websites, dynamic content, single-page applications (SPAs), and complex user interactions like clicks, form fills, and infinite scroll.

It was built as a successor to Puppeteer, with broader browser support and a more reliable auto-waiting mechanism that eliminates most timing-related scraping failures.

Why Use Playwright for Web Scraping?

Here’s why developers prefer Playwright over other scraping tools:

Handles dynamic content — Renders JavaScript before extracting data, so you never miss content that loads after the initial page request.
Auto-waiting — Automatically waits for elements to be visible and ready before interacting, eliminating most timing errors and flaky scripts.
Multi-browser support — Run the same scraping code across Chromium, Firefox, and WebKit without any modification.
Multiple languages — Works with Node.js, Python, Java, and .NET, so it fits whatever stack you’re already using.
Request interception — Block images, CSS, and unnecessary resources to speed up scraping significantly.
Built-in stealth capabilities — Harder to detect than older tools like Selenium, with better fingerprint handling out of the box.
Screenshot & PDF support — Capture page screenshots or generate PDFs natively, useful for visual debugging.

Setup

Make sure you have Node.js installed on your machine. If not, download it from the official Node.js website.

Once Node.js is ready, create a new project folder and initialize it:

    
     mkdir playwright-scraper
cd playwright-scraper
npm init -y

Then install Playwright and Cheerio. Cheerio will be used for parsing the raw HTML extracted by Playwright:

    
     npm install playwright cheerio

Once Playwright is installed you have to install a browser as well.

    
     npx playwright install

Next, install the browsers Playwright needs to run:

    
     npx playwright install

That’s all the setup you need. Let’s run a quick test to confirm everything is working.

How to run Playwright with Nodejs

Create a new file test.js and add the following code:

    
     const { chromium } = require('playwright');

async function playwrightTest() {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('https://www.scrapingdog.com');
  console.log(await page.title());
  await browser.close();
}

playwrightTest();

Here’s what each step does:

chromium.launch() launches a Chromium browser instance and setting headless: false opens a visible browser window, useful for debugging.
browser.newContext() creates an isolated browser context, similar to a fresh browser profile.
context.newPage() opens a new tab inside that context.
page.goto() navigates to the target URL
page.title() fetches the page title and logs it to the console
browser.close() shuts down the browser and frees up resources.

Run it with:

    
     node test.js

This completes the testing of our setup.

How to scrape with Playwright and Nodejs

In this section we are going to scrape the IMDb Most Popular Movies page and extract movie titles and ratings.

First, let’s grab the raw HTML of the page using page.content():

    
     const { chromium } = require('playwright');

async function playwrightTest() {
  const browser = await chromium.launch({
    headless: false, // Set to true in production
    args: ['--disable-blink-features=AutomationControlled']
  });
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('https://www.imdb.com/chart/moviemeter/');
  console.log(await page.content());
  await browser.close();
}

playwrightTest();

With page.content() method we are extracting the raw HTML from our target webpage. Once you run the code you will see this on your console.

You must be thinking that this data is just garbage. Well, you are right we have to parse the data out of this raw HTML and this can be done with a parsing library like Cheerio.

We are going to parse the name of the movie and the rating. Let’s find out the DOM location of each element.

Every movie data is stored inside a li tag with the class ipc-metadata-list-summary-item.

If you go inside this li tag you will see that the title of the movie is located inside a h3 tag with class ipc-title__text.

The rating is located inside the span tag with class ipc-rating-star — rating.

    
     const { chromium } = require('playwright');
const cheerio = require('cheerio');

async function playwrightTest() {
  let obj = {};
  let arr = [];

  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.imdb.com/chart/moviemeter/');
  const html = await page.content();
  const $ = cheerio.load(html);

  $('li.ipc-metadata-list-summary-item').each((i, el) => {
    obj['Title'] = $(el).find('h3.ipc-title__text').text().trim();
    obj['Rating'] = $(el).find('span.ipc-rating-star--rating').text().trim();
    arr.push(obj);
    obj = {};
  });

  console.log(arr);
  await browser.close();
}

playwrightTest();

We use Cheerio’s each() to iterate over every li tag, extract the title and rating, push them into an array, and log the final result. You should see a clean array of movie objects printed to your console.

Before closing the browser, we are going to print the output.

How to wait for an element in Playwright

When scraping dynamic websites, content often loads asynchronously after the initial page load. If you try to extract data before the target element appears in the DOM, your scraper will either return empty results or throw an error. This is where page.waitForSelector() comes in.

    
     const { chromium } = require('playwright');
const cheerio = require('cheerio')


async function playwrightTest() {
 
  const browser = await chromium.launch({
    headless: false
  });

  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.imdb.com/chart/moviemeter/');
  await page.waitForSelector('h1.ipc-title__text')
  await browser.close();
}

playwrightTest()

Here we are waiting for the title to appear before we close the browser.

How to do Infinite Scrolling with Playwright

Many e-commerce websites have infinite scrolling and you might have to scroll down in order to scroll the whole page.

    
     const { chromium } = require('playwright');
const cheerio = require('cheerio')


async function playwrightTest() {

  const browser = await chromium.launch({
    headless: false
  });

  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.imdb.com/chart/moviemeter/');
  let previousHeight;
  while (true) {
  previousHeight = await page.evaluate('document.body.scrollHeight');
  await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
  await page.waitForTimeout(2000); // Wait for new content to load

  const newHeight = await page.evaluate('document.body.scrollHeight');
  if (newHeight === previousHeight) break;
}
  await browser.close();
}

playwrightTest()

We are using while(true) to keep scrolling until we no longer have any new content loading. await page.evaluate(‘window.scrollTo(0, document.body.scrollHeight)’) Scrolls the page to the bottom by setting the vertical scroll position (window.scrollTo) to the maximum scrollable height. Once newHeight and previousHeight becomes equal we are breaking out of the loop.

Let’s see this in action.

How to type and click

In this example, we are going to simply visit www.google.com, enter a query, and click on the enter button. After that, we are going to scrape the results using page.content() method.

    
     const { chromium } = require('playwright');
const cheerio = require('cheerio')


async function playwrightTest() {

  const browser = await chromium.launch({
    headless: false
    
  });
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.google.com');

  await page.fill('textarea[name="q"]', 'Scrapingdog');
  await page.press('textarea[name="q"]', 'Enter');
  await page.waitForTimeout(3000);
  console.log(await page.content())

  await browser.close();
}

playwrightTest()

We are simply visiting google.com then we are typing ‘Scrapingdog’ in the search query using fill() method and then using the press() method we pressed the Enter button.

How to Use Proxies with Playwright

If you want to scrape a few hundred pages then old traditional methods are fine but if you want to scrape millions of pages then you have to use proxies in order to bypass IP banning.

    
     const browser = await chromium.launch({
    headless: false,   
    proxy: {
      server: 'http://IP:PORT',
      username: 'PASSWORD',
      password: 'USERNAME'
  }
  });

server specifies the proxy server’s address in the format: protocol://IP:PORT. username and password are the credentials for accessing that private IP. If it is public, you might not need a username and password.

How to take a screenshot

Playwright has a built-in screenshot() method that lets you capture the current state of any page. This is particularly useful for debugging — if your scraper is returning unexpected results, a screenshot tells you exactly what the browser was seeing at that moment.

    
     const { chromium } = require('playwright');

async function playwrightTest() {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.imdb.com/chart/moviemeter/');
  await page.waitForSelector('h1.ipc-title__text');
  await page.screenshot({ path: 'imdb.png' });

  console.log('Screenshot saved!');
  await browser.close();
}

playwrightTest();

This saves a screenshot of the visible viewport as imdb.png in your project folder. If you want to capture the entire page including content below the fold, add fullPage: true:

    
     await page.screenshot({ path: 'imdb-full.png', fullPage: true });

You can also capture a specific element rather than the entire page:

    
     const element = await page.$('h3.ipc-title__text');
await element.screenshot({ path: 'title.png' });

This is handy when you only need to visually inspect a particular part of the page rather than the whole thing.

How to handle pagination

Most websites split their data across multiple pages. Instead of manually visiting each page, you can automate pagination by detecting and clicking the “Next” button repeatedly until it no longer exists.

Let’s use GitHub’s topic page as an example:

    
     const { chromium } = require('playwright');
const cheerio = require('cheerio');

async function playwrightTest() {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://github.com/topics/nodejs');

  let results = [];
  let hasNextPage = true;

  while (hasNextPage) {
    await page.waitForSelector('article.border');
    const html = await page.content();
    const $ = cheerio.load(html);

    $('article.border').each((i, el) => {
      results.push({
        name: $(el).find('h3 a').last().text().trim(),
        description: $(el).find('div.px-3 p').text().trim(),
        stars: $(el).find('a[href*="stargazers"]').text().trim()
      });
    });

    const nextButton = await page.$('a[rel="next"]');
    if (nextButton) {
      await nextButton.click();
      await page.waitForTimeout(2000);
    } else {
      hasNextPage = false;
    }
  }

  console.log(results);
  await browser.close();
}

playwrightTest();

Here’s how the pagination logic works. On each iteration, we wait for the content to load, extract the data using Cheerio, then check if a “Next” button exists using page.$(). If it does, we click it and wait for the next page to load. If it doesn’t, we set hasNextPage to false and break out of the loop.

This pattern works on the vast majority of paginated websites. Just swap out the “Next” button selector to match your target site.

Scaling with Scrapingdog

Playwright is an excellent tool for scraping dynamic websites, but it comes with some real limitations once you try to scale:

Resource heavy — Running a full browser instance consumes significant RAM and CPU. Spinning up dozens of concurrent Playwright instances will quickly overwhelm even a powerful server.
Easy to detect — Despite its stealth capabilities, headless browsers leave fingerprints that sophisticated anti-bot systems like Cloudflare, Akamai, and DataDome can detect. You’ll start seeing CAPTCHAs and IP bans as soon as you scale up.
Proxy management — You need to source, rotate, and manage proxies yourself. A single IP will get banned fast on most popular websites.
Maintenance overhead — Websites change their DOM structure regularly. Every change can break your scraper and requires manual fixes.
Slow — Browser-based scraping is inherently slower than API-based approaches, making large-scale data collection time-consuming and expensive.

This is where Scrapingdog’s JavaScript Rendering API comes in. Instead of managing browsers, proxies, and anti-bot bypasses yourself, Scrapingdog handles all of it behind a single API call. It renders JavaScript just like Playwright does, but at scale with automatic proxy rotation, CAPTCHA solving, and bot detection bypass built in.

Here’s the same IMDb scrape done with Scrapingdog instead:

    
     const cheerio = require('cheerio');

async function scrapeWithScrapingdog() {
  const response = await axios.get('https://api.scrapingdog.com/scrape', {
    params: {
      api_key: 'YOUR_API_KEY',
      url: 'https://www.imdb.com/chart/moviemeter/',
      dynamic: true // enables JS rendering
    }
  });

  const $ = cheerio.load(response.data);
  const results = [];

  $('li.ipc-metadata-list-summary-item').each((i, el) => {
    results.push({
      title: $(el).find('h1.ipc-title__text').text().trim(),
      rating: $(el).find('span.ipc-rating-star--rating').text().trim()
    });
  });

  console.log(results);
}

scrapeWithScrapingdog();

No browser setup, no proxy rotation, no bot detection headaches — just clean data with a simple GET request. You can scrape any JavaScript-rendered page at any scale without worrying about infrastructure.

Try Scrapingdog free with 1,000 credits — no credit card required.

Playwright vs Puppeteer vs Selenium

Choosing the right browser automation tool depends on your specific use case. Here’s how the three compare:

Browser Support

Playwright — Chromium, Firefox, and WebKit (Safari) out of the box
Puppeteer — Chromium and Firefox only
Selenium — All major browsers including Chrome, Firefox, Safari, and Edge

Language Support

Playwright — Node.js, Python, Java, and .NET
Puppeteer — JavaScript and TypeScript only
Selenium — JavaScript, Python, Java, Ruby, C#, and more

Auto-Waiting

Playwright — Built-in auto-waiting for elements before every action
Puppeteer — Manual waits required in most cases
Selenium — Manual waits required, prone to timing errors

Speed

Playwright — Fast, supports parallel execution across multiple browser contexts
Puppeteer — Fast, but limited to single browser contexts per instance
Selenium — Slowest of the three due to WebDriver protocol overhead

Anti-Bot Detection

Playwright — Harder to detect than Selenium, stealth plugins available
Puppeteer — Moderate detection risk, stealth plugins available
Selenium — Easiest to detect, frequently blocked by anti-bot systems

Ease of Use

Playwright — Modern API, excellent documentation, easy setup
Puppeteer — Simple API, large community, well documented
Selenium — Steeper learning curve, more boilerplate code required

Best For

Playwright — Web scraping, end-to-end testing, dynamic content extraction
Puppeteer — Quick scraping tasks, Chrome-specific automation
Selenium — Cross-browser testing, legacy projects, enterprise environments

Overall, Playwright is the best choice for web scraping in 2026. It combines the speed of Puppeteer with broader browser support, better reliability through auto-waiting, and stronger anti-detection capabilities, all in a modern, well-maintained package.

Here are Some Key Takeaways:

Playwright is a browser automation framework for Node.js that allows you to control real browsers for scraping and testing.
It can handle dynamic, JavaScript-heavy websites that traditional HTTP requests cannot properly scrape.
The guide walks through setting up Playwright in a Node.js environment and launching a browser instance.
You can automate user-like actions such as clicking buttons, filling forms, waiting for elements, and extracting rendered content.
Playwright is useful for web scraping, end-to-end testing, and automating complex browser workflows.

Conclusion

Playwright, with its robust and versatile API, is a powerful tool for automating browser interactions and web scraping in Node.js. Whether you’re scraping data, waiting for elements, scrolling, or interacting with complex web elements like buttons and input fields, Playwright simplifies these tasks with its intuitive methods. Moreover, its support for proxies and built-in features like screenshot capturing and multi-browser support make it a reliable choice for developers.

I understand that these tasks can be time-consuming, and sometimes, it’s better to focus solely on data collection while leaving the heavy lifting to web scraping APIs like Scrapingdog. With Scrapingdog, you don’t have to worry about managing proxies, browsers, or retries — it takes care of everything for you. With just a simple GET request, you can scrape any page effortlessly using this API.

If you found this article helpful, please consider sharing it with your friends and followers on social media!

Additional Resources

My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.

Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

Playwright Web Scraping with Node.js: Complete Guide (2026)

Learn how to scrape websites using Playwright and Node.js. Covers setup, dynamic content, infinite scroll, stealth mode, proxies, and more with code examples.

2026-03-16

How to Scrape Google News Using Python

Learn how to scrape Google News using Python and BeautifulSoup. Covers RSS feeds, pagination, CSV export, and scaling with Scrapingdog's Google News API.

2026-03-13

Products

Use Case

Resources

Playwright Web Scraping with Node.js: Complete Guide (2026)

Table of Contents

What is Playwright?

Why Use Playwright for Web Scraping?

Setup

How to run Playwright with Nodejs

How to scrape with Playwright and Nodejs

How to wait for an element in Playwright

How to do Infinite Scrolling with Playwright

How to type and click

How to Use Proxies with Playwright

How to take a screenshot

How to handle pagination

Scaling with Scrapingdog

Playwright vs Puppeteer vs Selenium

Here are Some Key Takeaways:

Conclusion

Additional Resources

Web Scraping with Scrapingdog

Recent Blogs

Try Scrapingdog for Free!

Product

Scrapingdog vs Competitors

Learn Web Scraping

Company

Free Tools