Targets, extraction rules, proxy pools, scheduled jobs, and a data pipeline โ all in one self-hosted platform with a real-time dashboard.
$ pyrosync targets:add --url "https://example.com/api"
target created: tgt_ckx9a...
$ pyrosync rules:add --target tgt_ckx9a --selector ".price"
rule created: rul_m4kv2...
$ pyrosync jobs:run --target tgt_ckx9a
job queued: job_p8wnr... (worker-01)
completed in 1.2s โ 24 fields extracted
_
Six core systems working together. Configure once, run forever.
Configure URLs with custom headers, selectors, and request options. Full control over every crawl.
CSS selectors, JSON-LD, regex transforms. Build extraction pipelines that output clean, structured data.
Pool management with round-robin, random, and least-used strategies. Auto-disable failing proxies.
Cron-based scheduling with retry logic, parent-child jobs, and real-time status tracking.
Results stored with diffs, hashing, and optional vector embeddings. Query, export, or webhook downstream.
Threshold-based alerts via email, Slack, Discord. Webhooks fire on job events with configurable retries.
Self-hosted
Your infra, your data, your rules
Real-time
Live job status, worker health, data diffs
API-first
Every action available via REST + API keys
Built with
Set up your instance in under 5 minutes. Create an admin account and start configuring targets.