Blog tag

#web scraping

10 posts tagged with web scraping.

February 10, 20264 min read

How I Built a Real-Time Fleet Dashboard for Distributed Scraping Workers in GhostGraph

I built a lightweight, real-time dashboard to monitor GhostGraph's distributed scraping workers using FastAPI, Redis Streams, and server-sent events.

FastAPIRedis Streamsreal-time monitoringdistributed systemsweb scrapingPython

February 8, 20263 min read

From Sequential to Parallel: How I Scaled URL Fetching in My LLM-Powered Crawler

I replaced sequential HTTP fetching with asyncio-powered concurrency—5 requests at a time—and slashed my crawl times by 70%.

asyncioweb scrapingconcurrencyPythonVultr Crawlerperformance

February 6, 20264 min read

Building Autonomous Browser Agents: How I Scaled Vultr Crawler with Session Management and DOM Distillation

How I built stateful, token-efficient browser agents in Vultr Crawler using session APIs, DOM distillation, and autonomous action loops.

web scrapingbrowser automationLLM optimizationREST APIdistributed systems

January 31, 20264 min read

How I Fixed Hung Connections in My Distributed Crawler with Hard Timeout Enforcement

I stopped silent network hangs in my Python crawler by layering signal-based hard timeouts over curl_cffi and adding IP rotation to preserve throughput.

PythonWeb ScrapingDistributed Systemscurl_cffiTimeoutsDebugging

January 27, 20263 min read

From Direct Queries to Clean Repositories: Refactoring a Python Scraper’s Database Layer

How I replaced raw Postgres queries with a type-safe repository pattern in a production scraper—improving testability and long-term maintainability.

pythondatabase designweb scrapingclean architecturetesting

January 25, 20264 min read

Migrating from ARQ to Motia: Building a Lightweight, Event-Driven Worker Framework for Scalable Scraping

I replaced ARQ with my custom event-driven framework Motia to gain control, clarity, and reliability in my scraping workflows.

PythonBackground JobsDistributed SystemsWeb ScrapingSystem Design

January 24, 20264 min read

From Chaos to Clarity: How I Unified My Worker Architecture with ARQ in the Vultr Scraper

I replaced a tangled mess of Python workers with ARQ and Redis, cutting complexity and boosting reliability in my scraping pipeline.

pythonarqredisdistributed-workersweb-scrapingarchitecture

January 23, 20264 min read

Building a Smart Crawler with LLM-Powered Extraction and ARQ Task Orchestration

How I used LLMs and ARQ to build a self-adapting, scalable web scraper that survives real-world site changes.

web scrapingARQLLMdistributed systemsPythondata pipelines

January 22, 20264 min read

How I Built a Scalable Site Discovery Engine for the Vultr Scraper in One Day

I architected a real-time site discovery engine for the Vultr Scraper in under 24 hours—here's how modular design and smart routing made it possible.

web scrapingbackend engineeringcrawling architectureAPI designdatabase schema

January 21, 20264 min read

Building a Web UI for a Headless Scraping Engine: How I Brought the Vultr Scraper to Life with FastAPI and HTML Templates

I turned a script-driven scraper into a fully observable web interface using FastAPI and server-rendered templates—no frontend framework needed.

FastAPIweb scrapingPythonobservabilitydeveloper tooling