Show HN: Crawlee – Web scraping and browser automation library for Node.js

282 points

4 years ago

Hey HN,

This is Jan, founder of Apify, a web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee [1], the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.

For details, see the short video [2] or read the announcement blog post [3].

Main features:

- Supports headless browsers with Playwright or Puppeteer

- Supports raw HTTP crawling with Cheerio or JSDOM

- Automated parallelization and scaling of crawlers for best performance

- Avoids blocking using smart sessions, proxies, and browser fingerprints

- Simple management and persistence of queues of URLs to crawl

- Written completely in TypeScript for type safety and code autocompletion

- Comprehensive documentation, code examples, and tutorials

- Actively maintained and developed by Apify—we use it ourselves!

- Lively community on Discord

To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler

If you have any questions or comments, our team will be happy to answer them here.

[1] https://crawlee.dev/

[2] https://www.youtube.com/watch?v=g1Ll9OlFwEQ

[3] https://blog.apify.com/announcing-crawlee-the-web-scraping-a...

80 comments