
A thousand tweets a day, fifty worth reading
Every morning, around a thousand IFTTT-generated emails land in our inbox. Each one carries a single Twitter link: keyword subscriptions, competitor alerts, KOL tracking, industry signals, all jammed together in an unreadable pile. Roughly fifty are actually worth reading. The other 950 are noise.
The catch: you cannot tell which is which from the email subject. To filter properly, you have to open each tweet and pull three pieces of information (the actual content, the view count, and the like count) before deciding whether a post deserves a closer look.
So we did what every engineering team does when faced with a thousand repetitive clicks. We automated it.
Where the first version broke
Our first crawler was a straightforward Actionbook script running in local mode. Open a batch of tabs, visit each IFTTT-provided link, extract the content and metrics, move on. On paper, we could push it to 30 concurrent tabs on a single machine. In practice, it hit Twitter's rate limit almost immediately.
Anything above a handful of concurrent requests started coming back with empty pages, challenge screens, or outright blocks. Dropping concurrency to keep the crawler stable meant a full run took around thirty minutes. By the time the data landed, standup was halfway over.
We tried the usual bag of tricks. Jittered delays. Rotating user agents. Residential proxies. Each one added maintenance cost and the payoff kept shrinking.
The bottleneck was not local. It was not CPU, it was not bandwidth, it was not even the script.
It was the single exit IP.
Every request went out through the same door. Twitter was not rate-limiting our machine. It was rate-limiting anyone knocking from that address. No amount of local optimization can fix a problem that lives at the network edge.
The only real way forward was to move to cloud browsers, so that every request could go out from a different IP.
One flag, three backends
Around this time, Actionbook shipped exactly what we needed: a --provider flag on browser start that delegates the session to a cloud browser service. Today it supports three backends:
| Provider | --provider value | API key env var |
|---|---|---|
| Driver | driver | DRIVER_API_KEY |
| HyperBrowser | hyperbrowser | HYPERBROWSER_API_KEY |
| BrowserUse | browseruse | BROWSER_USE_API_KEY |
Switching backends is one flag. The script never changes:
# Same script, three different egress pools
actionbook browser start --provider driver --session tweets-a
actionbook browser start --provider hyperbrowser --session tweets-b
actionbook browser start --provider browseruse --session tweets-c
Each browser start returns a session your scraper attaches to. The crawler does not know which backend is behind it, and it does not need to.
The new SOP: spreading the load across three providers
Here is where the economics get interesting. Each of these cloud browser providers offers a free tier. Individually, none of them is generous enough to handle our full daily volume. Together, running in parallel, they comfortably are.
So we designed the pipeline around that observation.
At 7 AM, a cron job ingests every IFTTT email from the overnight inbox and extracts the Twitter link embedded in each one. Today that yields around a thousand URLs. The list is split into three roughly equal slices, one per provider.
For each slice, we open a single cloud browser session on the corresponding provider and walk the URLs through one tab at a time. Three providers run in parallel: one session each, one tab in flight per session, three real requests in flight at any moment. Each request egresses from a completely independent IP pool, so if Twitter throttles one provider the other two keep going without even noticing.
From the crawler's point of view, none of this complexity exists. Actionbook abstracts away the differences between Driver, HyperBrowser, and BrowserUse. We wrote one scraper. The orchestrator decides which provider gets which slice, spins up the sessions, and collects the results.
Once everything lands, the pipeline runs a summarization pass over the collected content and applies our relevance filters. The thousand raw URLs collapse into about fifty tweets that genuinely deserve attention, and those show up in the morning briefing channel before standup. End-to-end runtime dropped from around thirty minutes to about five, and rate-limit errors went from a daily annoyance to something we no longer track.
Closing thought
This Twitter pipeline is just one example. Whether you are monitoring social signals, tracking SEO competitors, scraping product catalogs, or running compliance checks, high-volume access to a single domain hits the same wall: single-IP rate limits.
With --provider, that wall becomes a configuration choice. Pick a backend per session, run as many sessions in parallel as you need, and let Actionbook handle the rest.
Ready to build your playbook?
Join our Discord to share your use case and get direct guidance from the Actionbook team.