From Sequential to Parallel: How I Scaled URL Fetching in My LLM-Powered Crawler

The Bottleneck Was Obvious (But I Built It Anyway)

When I first spun up the Vultr Crawler, my goal was simple: explore a site’s structure, classify pages using an LLM, and extract actionable schema. I didn’t care about speed—yet. So I wrote the fetcher the easy way: one URL at a time, top to bottom.

It worked fine for small sites. A 20-page docs site? Done in 15 seconds. But when I pointed it at larger properties—hundreds of pages, deep navigation trees—my exploration phase started taking minutes. Not because the LLM was slow. Not because parsing was heavy. But because I was waiting… and waiting… and waiting for each aiohttp.get() to finish before starting the next.

I knew sequential fetching was a bottleneck. But sometimes you have to ship first, then optimize. And on February 8th, 2026, I finally killed it.

Enter asyncio.Semaphore: Concurrency With Guardrails

I weren’t trying to DDoS anyone. My crawler runs in shared environments, and the target sites aren’t mine. So blind parallelism was off the table. I needed controlled concurrency—something that could run multiple fetches at once but wouldn’t overwhelm the network or the server.

That’s where asyncio.Semaphore came in.

I wrapped my fetch logic in a semaphore limited to 5 concurrent requests:

semaphore = asyncio.Semaphore(5)

async def fetch_url(session, url):
    async with semaphore:
        try:
            async with session.get(url) as response:
                content = await response.text()
                return {"url": url, "content": content, "status": response.status}
        except Exception as e:
            return {"url": url, "error": str(e), "status": None}

This simple guard ensures that no more than 5 HTTP requests are active at any moment. The rest wait their turn—cooperatively—without blocking the event loop.

I paired this with asyncio.gather() to launch all exploration fetches as a single coroutine batch:

results = await asyncio.gather(
    *[fetch_url(session, url) for url in urls_to_fetch],
    return_exceptions=True
)

Each failed request returns an error object instead of crashing the whole batch. That’s critical when crawling real-world sites—timeouts, 404s, and TLS issues are the norm, not the exception.

I also preserved my rate limiting at the domain level, ensuring I never exceeded 1 request per 200ms on average. The semaphore handles concurrency; the rate limiter handles politeness. They work together like a good pit crew.

70% Faster Crawls, 5x Higher Capacity

The results were immediate.

On a test site with 120 pages, the old sequential fetcher took 2.8 minutes. The new parallel version? 52 seconds. That’s a 70% reduction—not bad for a 15-line change.

But the real win wasn’t just speed. It was scalability.

With sequential fetching, I capped max_pages at 100. Anything higher risked timeouts and poor UX. After switching to parallel fetching, I raised the cap to 500 pages—and the exploration phase still finishes under 3 minutes on most sites.

That opens up entirely new use cases: crawling large documentation hubs, API references, and support portals that were previously out of scope.

And because the event loop stays responsive, my LLM classification pipeline now receives URLs in batches faster, reducing idle time and improving end-to-end throughput.

This wasn’t a rewrite. It wasn’t a new framework or a distributed queue. Just one well-placed semaphore and a shift in mindset: stop waiting, start coordinating.

If you’re building crawlers, scrapers, or any I/O-heavy pipeline in Python, don’t underestimate the power of asyncio.Semaphore. It’s not flashy, but it’s the quiet hero of scalable concurrency.