Blog tag

#distributed systems

11 posts tagged with distributed systems.

← Back to all posts
4 min read

From Hardcoded Logic to Agent-Driven Routing: Refactoring ClawHub’s Orchestration Layer

How I replaced ClawHub’s monolithic routing node with agent self-determination to improve scalability and reduce coupling.

agent architecturesystem designrefactoringscalabilityClawHubdistributed systems
Read more
4 min read

Building a Resilient Task Requeue Mechanism in GhostGraph: Recovering Orphaned Pipeline Jobs

How we built a requeue endpoint in GhostGraph to revive stalled Redis Stream jobs and maintain pipeline integrity.

Redis Streamstask queuespipeline resilienceGhostGraphdistributed systems
Read more
4 min read

How We Built a Real-Time Fleet Dashboard for Distributed Scraping Workers in GhostGraph

We built a lightweight, real-time dashboard to monitor GhostGraph's distributed scraping workers using FastAPI, Redis Streams, and server-sent events.

FastAPIRedis Streamsreal-time monitoringdistributed systemsweb scrapingPython
Read more
4 min read

Building Autonomous Browser Agents: How We Scaled Vultr Crawler with Session Management and DOM Distillation

How we built stateful, token-efficient browser agents in Vultr Crawler using session APIs, DOM distillation, and autonomous action loops.

web scrapingbrowser automationLLM optimizationREST APIdistributed systems
Read more
4 min read

Building a Smarter Web Crawler: How We Implemented Two-Phase Intelligent Exploration in Vultr Crawler

We rebuilt our web crawler to move beyond brute-force scraping—now it learns patterns and adapts in real time.

web crawlingPlaywrightRedispattern recognitiondistributed systems
Read more
4 min read

How We Fixed Hung Connections in Our Distributed Crawler with Hard Timeout Enforcement

We stopped silent network hangs in our Python crawler by layering signal-based hard timeouts over curl_cffi and adding IP rotation to preserve throughput.

PythonWeb ScrapingDistributed Systemscurl_cffiTimeoutsDebugging
Read more
4 min read

How We Scaled a Distributed Crawler with Atomic Redis State Management

How atomic Redis operations fixed state corruption during worker shutdowns in our distributed Vultr Crawler.

redisdistributed-systemsweb-crawlerpythondata-consistency
Read more
4 min read

Migrating Job State Management from Redis to Postgres: Why We Centralized Crawler Jobs in a Single Source of Truth

We moved job claiming in the Vultr Crawler from Redis to Postgres for better consistency, auditability, and operational simplicity.

distributed systemsPostgresRedisjob queuescrawler architecturedata consistency
Read more
4 min read

Replacing ARQ with a Unified Redis Streams Worker: Why We Simplified Our Distributed Task System

We replaced ARQ with a lightweight Redis Streams polling worker—cutting 6k+ lines and improving reliability across our scraping fleet.

pythonredisdistributed systemstask queuesarchitecture
Read more
4 min read

Migrating from ARQ to Motia: Building a Lightweight, Event-Driven Worker Framework for Scalable Scraping

We replaced ARQ with our custom event-driven framework Motia to gain control, clarity, and reliability in our scraping workflows.

PythonBackground JobsDistributed SystemsWeb ScrapingSystem Design
Read more
4 min read

Building a Smart Crawler with LLM-Powered Extraction and ARQ Task Orchestration

How I used LLMs and ARQ to build a self-adapting, scalable web scraper that survives real-world site changes.

web scrapingARQLLMdistributed systemsPythondata pipelines
Read more