Context
The Python SDK currently pulls in a Rust-based dependency (impit) by default. Users have reported install-time friction caused by these native deps — not blocking, but a source of papercuts (e.g., environments without a Rust toolchain, slower installs, more failure modes on exotic platforms).
A representative user report (source):
User uses the Python SDK and manages scraping via the Apify API. He raised an issue with unexpected Rust dependencies surfacing in recent Python SDK versions — not a blocking issue, but a source of setup friction.
How we got here
About a year ago, we made Impit the default HTTP client in Crawlee so that crawls are stealthy out of the box. Because the SDK depends on Crawlee, Impit became a transitive dependency. We then also switched the Apify API client from HTTPX to Impit to avoid shipping two HTTP clients (HTTP clients are heavy).
Possible direction
- Extract the shared base used by both Crawlee and the SDK into a standalone package (e.g.
apify-shared): storages, storage clients, event managers, service locator, and maybe more.
- Keep Impit as the default in Crawlee (stealth out of the box stays).
- Switch the SDK and Apify API client back to a Python HTTP client (HTTPX), so a plain
pip install apify does not require a Rust toolchain.
Trade-off
Apify Actors based on Crawlee would end up shipping two HTTP clients (Impit for crawl traffic, HTTPX for Apify API traffic), making those images/installs larger. SDK-only users (no Crawlee) would benefit the most.
When
Not urgent. Worth revisiting later.
Context
The Python SDK currently pulls in a Rust-based dependency (impit) by default. Users have reported install-time friction caused by these native deps — not blocking, but a source of papercuts (e.g., environments without a Rust toolchain, slower installs, more failure modes on exotic platforms).
A representative user report (source):
How we got here
About a year ago, we made Impit the default HTTP client in Crawlee so that crawls are stealthy out of the box. Because the SDK depends on Crawlee, Impit became a transitive dependency. We then also switched the Apify API client from HTTPX to Impit to avoid shipping two HTTP clients (HTTP clients are heavy).
Possible direction
apify-shared): storages, storage clients, event managers, service locator, and maybe more.pip install apifydoes not require a Rust toolchain.Trade-off
Apify Actors based on Crawlee would end up shipping two HTTP clients (Impit for crawl traffic, HTTPX for Apify API traffic), making those images/installs larger. SDK-only users (no Crawlee) would benefit the most.
When
Not urgent. Worth revisiting later.