Blog tag

#web scraping

10 posts tagged with web scraping.

← Back to all posts
4 min read

How We Built a Real-Time Fleet Dashboard for Distributed Scraping Workers in GhostGraph

We built a lightweight, real-time dashboard to monitor GhostGraph's distributed scraping workers using FastAPI, Redis Streams, and server-sent events.

FastAPIRedis Streamsreal-time monitoringdistributed systemsweb scrapingPython
Read more
3 min read

From Sequential to Parallel: How We Scaled URL Fetching in Our LLM-Powered Crawler

We replaced sequential HTTP fetching with asyncio-powered concurrency—5 requests at a time—and slashed our crawl times by 70%.

asyncioweb scrapingconcurrencyPythonVultr Crawlerperformance
Read more
4 min read

Building Autonomous Browser Agents: How We Scaled Vultr Crawler with Session Management and DOM Distillation

How we built stateful, token-efficient browser agents in Vultr Crawler using session APIs, DOM distillation, and autonomous action loops.

web scrapingbrowser automationLLM optimizationREST APIdistributed systems
Read more
4 min read

How We Fixed Hung Connections in Our Distributed Crawler with Hard Timeout Enforcement

We stopped silent network hangs in our Python crawler by layering signal-based hard timeouts over curl_cffi and adding IP rotation to preserve throughput.

PythonWeb ScrapingDistributed Systemscurl_cffiTimeoutsDebugging
Read more
3 min read

From Direct Queries to Clean Repositories: Refactoring a Python Scraper’s Database Layer

How I replaced raw Postgres queries with a type-safe repository pattern in a production scraper—improving testability and long-term maintainability.

pythondatabase designweb scrapingclean architecturetesting
Read more
4 min read

Migrating from ARQ to Motia: Building a Lightweight, Event-Driven Worker Framework for Scalable Scraping

We replaced ARQ with our custom event-driven framework Motia to gain control, clarity, and reliability in our scraping workflows.

PythonBackground JobsDistributed SystemsWeb ScrapingSystem Design
Read more
4 min read

From Chaos to Clarity: How We Unified Our Worker Architecture with ARQ in the Vultr Scraper

We replaced a tangled mess of Python workers with ARQ and Redis, cutting complexity and boosting reliability in our scraping pipeline.

pythonarqredisdistributed-workersweb-scrapingarchitecture
Read more
4 min read

Building a Smart Crawler with LLM-Powered Extraction and ARQ Task Orchestration

How I used LLMs and ARQ to build a self-adapting, scalable web scraper that survives real-world site changes.

web scrapingARQLLMdistributed systemsPythondata pipelines
Read more
4 min read

How We Built a Scalable Site Discovery Engine for the Vultr Scraper in One Day

We architected a real-time site discovery engine for the Vultr Scraper in under 24 hours—here's how modular design and smart routing made it possible.

web scrapingbackend engineeringcrawling architectureAPI designdatabase schema
Read more
4 min read

Building a Web UI for a Headless Scraping Engine: How We Brought the Vultr Scraper to Life with FastAPI and HTML Templates

We turned a script-driven scraper into a fully observable web interface using FastAPI and server-rendered templates—no frontend framework needed.

FastAPIweb scrapingPythonobservabilitydeveloper tooling
Read more