From Sequential to Parallel: How We Scaled URL Fetching in Our LLM-Powered Crawler
The Bottleneck Was Obvious (But We Built It Anyway)
When we first spun up the Vultr Crawler, our goal was simple: explore a site’s structure, classify pages using an LLM, and extract actionable schema. We didn’t care about speed—yet. So we wrote the fetcher the easy way: one URL at a time, top to bottom.
It worked fine for small sites. A 20-page docs site? Done in 15 seconds. But when we pointed it at larger properties—hundreds of pages, deep navigation trees—our exploration phase started taking minutes. Not because the LLM was slow. Not because parsing was heavy. But because we were waiting… and waiting… and waiting for each aiohttp.get() to finish before starting the next.
We knew sequential fetching was a bottleneck. But sometimes you have to ship first, then optimize. And on February 8th, 2026, we finally killed it.
Enter asyncio.Semaphore: Concurrency With Guardrails
We weren’t trying to DDoS anyone. Our crawler runs in shared environments, and the target sites aren’t ours. So blind parallelism was off the table. We needed controlled concurrency—something that could run multiple fetches at once but wouldn’t overwhelm the network or the server.
That’s where asyncio.Semaphore came in.
We wrapped our fetch logic in a semaphore limited to 5 concurrent requests:
semaphore = asyncio.Semaphore(5)
async def fetch_url(session, url):
async with semaphore:
try:
async with session.get(url) as response:
content = await response.text()
return {"url": url, "content": content, "status": response.status}
except Exception as e:
return {"url": url, "error": str(e), "status": None}
This simple guard ensures that no more than 5 HTTP requests are active at any moment. The rest wait their turn—cooperatively—without blocking the event loop.
We paired this with asyncio.gather() to launch all exploration fetches as a single coroutine batch:
results = await asyncio.gather(
*[fetch_url(session, url) for url in urls_to_fetch],
return_exceptions=True
)
Each failed request returns an error object instead of crashing the whole batch. That’s critical when crawling real-world sites—timeouts, 404s, and TLS issues are the norm, not the exception.
We also preserved our rate limiting at the domain level, ensuring we never exceeded 1 request per 200ms on average. The semaphore handles concurrency; the rate limiter handles politeness. They work together like a good pit crew.
70% Faster Crawls, 5x Higher Capacity
The results were immediate.
On a test site with 120 pages, the old sequential fetcher took 2.8 minutes. The new parallel version? 52 seconds. That’s a 70% reduction—not bad for a 15-line change.
But the real win wasn’t just speed. It was scalability.
With sequential fetching, we capped max_pages at 100. Anything higher risked timeouts and poor UX. After switching to parallel fetching, we raised the cap to 500 pages—and the exploration phase still finishes under 3 minutes on most sites.
That opens up entirely new use cases: crawling large documentation hubs, API references, and support portals that were previously out of scope.
And because the event loop stays responsive, our LLM classification pipeline now receives URLs in batches faster, reducing idle time and improving end-to-end throughput.
This wasn’t a rewrite. It wasn’t a new framework or a distributed queue. Just one well-placed semaphore and a shift in mindset: stop waiting, start coordinating.
If you’re building crawlers, scrapers, or any I/O-heavy pipeline in Python, don’t underestimate the power of asyncio.Semaphore. It’s not flashy, but it’s the quiet hero of scalable concurrency.