This is Jan, founder of Apify, a web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee [1], the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.
For details, see the short video [2] or read the announcement blog post [3].
Main features:
- Supports headless browsers with Playwright or Puppeteer
- Supports raw HTTP crawling with Cheerio or JSDOM
- Automated parallelization and scaling of crawlers for best performance
- Avoids blocking using smart sessions, proxies, and browser fingerprints
- Simple management and persistence of queues of URLs to crawl
- Written completely in TypeScript for type safety and code autocompletion
- Comprehensive documentation, code examples, and tutorials
- Actively maintained and developed by Apify—we use it ourselves!
- Lively community on Discord
To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler
If you have any questions or comments, our team will be happy to answer them here.
[2] https://www.youtube.com/watch?v=g1Ll9OlFwEQ
[3] https://blog.apify.com/announcing-crawlee-the-web-scraping-a...