Blog tag

#distributed systems

11 posts tagged with distributed systems.

February 14, 20264 min read

From Hardcoded Logic to Agent-Driven Routing: Refactoring ClawHub’s Orchestration Layer

How I replaced ClawHub’s monolithic routing node with agent self-determination to improve scalability and reduce coupling.

agent architecturesystem designrefactoringscalabilityClawHubdistributed systems

February 11, 20264 min read

Building a Resilient Task Requeue Mechanism in GhostGraph: Recovering Orphaned Pipeline Jobs

How I built a requeue endpoint in GhostGraph to revive stalled Redis Stream jobs and maintain pipeline integrity.

Redis Streamstask queuespipeline resilienceGhostGraphdistributed systems

February 10, 20264 min read

How I Built a Real-Time Fleet Dashboard for Distributed Scraping Workers in GhostGraph

I built a lightweight, real-time dashboard to monitor GhostGraph's distributed scraping workers using FastAPI, Redis Streams, and server-sent events.

FastAPIRedis Streamsreal-time monitoringdistributed systemsweb scrapingPython

February 6, 20264 min read

Building Autonomous Browser Agents: How I Scaled Vultr Crawler with Session Management and DOM Distillation

How I built stateful, token-efficient browser agents in Vultr Crawler using session APIs, DOM distillation, and autonomous action loops.

web scrapingbrowser automationLLM optimizationREST APIdistributed systems

February 1, 20264 min read

Building a Smarter Web Crawler: How I Implemented Two-Phase Intelligent Exploration in Vultr Crawler

I rebuilt my web crawler to move beyond brute-force scraping—now it learns patterns and adapts in real time.

web crawlingPlaywrightRedispattern recognitiondistributed systems

January 31, 20264 min read

How I Fixed Hung Connections in My Distributed Crawler with Hard Timeout Enforcement

I stopped silent network hangs in my Python crawler by layering signal-based hard timeouts over curl_cffi and adding IP rotation to preserve throughput.

PythonWeb ScrapingDistributed Systemscurl_cffiTimeoutsDebugging

January 30, 20264 min read

How I Scaled a Distributed Crawler with Atomic Redis State Management

How atomic Redis operations fixed state corruption during worker shutdowns in my distributed Vultr Crawler.

redisdistributed-systemsweb-crawlerpythondata-consistency

January 29, 20264 min read

Migrating Job State Management from Redis to Postgres: Why I Centralized Crawler Jobs in a Single Source of Truth

I moved job claiming in the Vultr Crawler from Redis to Postgres for better consistency, auditability, and operational simplicity.

distributed systemsPostgresRedisjob queuescrawler architecturedata consistency

January 28, 20264 min read

Replacing ARQ with a Unified Redis Streams Worker: Why I Simplified My Distributed Task System

I replaced ARQ with a lightweight Redis Streams polling worker—cutting 6k+ lines and improving reliability across my scraping fleet.

pythonredisdistributed systemstask queuesarchitecture

January 25, 20264 min read

Migrating from ARQ to Motia: Building a Lightweight, Event-Driven Worker Framework for Scalable Scraping

I replaced ARQ with my custom event-driven framework Motia to gain control, clarity, and reliability in my scraping workflows.

PythonBackground JobsDistributed SystemsWeb ScrapingSystem Design

January 23, 20264 min read

Building a Smart Crawler with LLM-Powered Extraction and ARQ Task Orchestration

How I used LLMs and ARQ to build a self-adapting, scalable web scraper that survives real-world site changes.

web scrapingARQLLMdistributed systemsPythondata pipelines