Crawlee is an advanced web scraping and browser automation library designed to help you build robust and efficient crawlers quickly.
🚀 Exciting News: Crawlee for Python is Now Available for Early Adopters!
Get started with a simple command:
1 |
pipx run crawlee create my-crawler |
Key Features:
Reliable Crawling:
While Crawlee doesn’t automatically fix broken selectors yet, it significantly speeds up the development and maintenance of your crawlers.
Seamless Adaptability:
Switch between browser crawlers and API-based crawls effortlessly. When a website introduces JavaScript rendering, you only need to switch to a browser crawler without rewriting your entire codebase.
Community Driven:
Built by seasoned web scrapers who use Crawlee daily to scrape millions of pages. Join our thriving community on Discord.
Modern Python with Type Hints:
Crawlee is written using modern Python with type hints, offering code completion in your IDE and helping you catch bugs early.
Headless Browsers:
Easily switch from HTTP to a headless browser with just three lines of code. Crawlee builds on top of Playwright and adds unique features, supporting browsers like Chrome and Firefox.
Automatic Scaling and Proxy Management:
Crawlee intelligently manages concurrency based on system resources and rotates proxies to ensure reliability. Proxies that frequently time out or return errors are automatically discarded.
Getting Started with Crawlee
Prerequisites:
- Python 3.9 or higher
The fastest way to try Crawlee is to use the Crawlee CLI:
1 |
pip install crawlee[playwright] |
Explore the full potential of Crawlee and join a community of professionals who scrape efficiently and effectively. Try Crawlee Today!