Self-hosted ยท Open architecture

Web scraping infrastructure you own

Targets, extraction rules, proxy pools, scheduled jobs, and a data pipeline โ€” all in one self-hosted platform with a real-time dashboard.

pyrosync โ€” dashboard

$ pyrosync targets:add --url "https://example.com/api"

target created: tgt_ckx9a...

$ pyrosync rules:add --target tgt_ckx9a --selector ".price"

rule created: rul_m4kv2...

$ pyrosync jobs:run --target tgt_ckx9a

job queued: job_p8wnr... (worker-01)

completed in 1.2s โ€” 24 fields extracted

_

Everything you need to scrape at scale

Six core systems working together. Configure once, run forever.

Scraping Targets

Configure URLs with custom headers, selectors, and request options. Full control over every crawl.

Extraction Rules

CSS selectors, JSON-LD, regex transforms. Build extraction pipelines that output clean, structured data.

Proxy Rotation

Pool management with round-robin, random, and least-used strategies. Auto-disable failing proxies.

Job Scheduling

Cron-based scheduling with retry logic, parent-child jobs, and real-time status tracking.

Data Pipeline

Results stored with diffs, hashing, and optional vector embeddings. Query, export, or webhook downstream.

Alerts & Webhooks

Threshold-based alerts via email, Slack, Discord. Webhooks fire on job events with configurable retries.

Self-hosted

Your infra, your data, your rules

Real-time

Live job status, worker health, data diffs

API-first

Every action available via REST + API keys

Built with

Next.js 16PlaywrightPostgreSQLDrizzle ORMpg-bossTurborepo

Ready to deploy?

Set up your instance in under 5 minutes. Create an admin account and start configuring targets.