Scraping the web with Node.js is resource-intensive. It is also nearly impossible to achieve on a Serverless platform such as Vercel. Likewise, using a large framework like Playwright is overkill for such a common and essentially simple idea.

Defer, by providing native Puppeteer support, makes it easy to implement scraping use cases. You can also control your scraping jobs with Defer Console.

The code — Wrap Puppeteer as a Defer function

Below, we show you how to use Defer to implement a popular use case: generating thumbnail images from user-provided links with only a few lines of code.

src/defer/generateLinkThumbnail.ts
import { defer } from "@defer/client";
import puppeteer from "puppeteer";

const generateLinkThumbnail = async (url: string) => {
  // make sure to pass the `--no-sandbox` option
  const browser = await puppeteer.launch({ args: ["--no-sandbox"] });

  const page = await browser.newPage();

  // pass the following `waitUntil` option to avoid
  //   unnecessary blocking when loading a page
  await page.goto(url, { waitUntil: "networkidle0" });

  // perform some actions...

  // any file must be store in the local `/workspace/` folder
  await page.screenshot({ path: "/workspace/screenshot.jpg" });

  // save the screenshot to an external storage (e.g., S3)...

  await browser.close();
};

export default defer(generateLinkThumbnail, {
  retry: 5, // adding retry in case of flakiness issues
});

This is a standard Defer pattern:

  • Wrap the functionality in a function. Here, the function generateLinkThumbnail() performs all the scraping and thumbnail logic.
  • Use the Defer client (defer) to call the wrapping function and trigger an execution in the background. This is done in the last line of the above code.

Monitoring Defer and Puppeteer

With Defer Console, you can monitor your scraping function. Puppeteer can run for multiple hours:

Untitled

Finally, Defer Console notifies you in case of delays or failures. By providing job details, the Console allows you to abort the stuck executions or re-run the failed ones:

Untitled