PagerProof
Every page, statistically justified.
We calibrate your Prometheus alert thresholds with block-bootstrap under H0 and hand you the proven false-positive rate per rule, per year — not a vibe, a number with a condition attached.
$ pagerproof calibrate --rule HighRequestLatency
reading prometheus history… 90d @ 1m resolution
excluding 3 anomalous windows from baseline
block bootstrap: 10,000 resamples, block = 4× for:
rule HighRequestLatency (for: 5m)
current thr=0.80 verdict=noisy fp/yr now=87
calib thr=0.94 fp/yr=11.2 [CI 8.4–14.7]
sensitivity: 4/4 historical incidents still fire
✓ written to proof/HighRequestLatency.yaml
The problem
Your thresholds were guessed. Your on-call pays for it.
None of these numbers are ours — they're the industry's. Sources in brackets.
of pages from static thresholds are false positives. Well-calibrated baselines bring that down to 5–15%.
[openobserve, AIOps guide]of engineer time burned per dispatched false positive — context switch, triage, write-up, back to work.
[reliamag / oxmaint]lost by a team of three at 12–18 false alerts per day. That's most of a full-time engineer, paged into nothing.
[reliamag / oxmaint]"After enough false alarms, the team starts ignoring the warnings."
— AssetWatch. That's the moment your alerting system is functionally dead.
How it works
Four steps. No platform migration. No agent.
Read-only Prometheus access
We query your metric history over the standard HTTP API — or you run our tooling inside your own infra. Nothing is written, nothing is changed.
Simulate your rules against history
Every alert rule is replayed against your real series, with its real for: duration, to measure how it actually behaves — not how it was meant to.
Block-bootstrap FP/year under H0
We resample blocks of your own healthy-regime data, preserving its autocorrelation, and compute the expected false-positive rate per year for each rule — with a confidence interval.
Calibrated thresholds + The Proof
You get every threshold recalibrated to your FP budget, a sensitivity check against your historical incidents, and the full report — ready to apply in an afternoon.
Why not mean+2σ
Your mean+2σ doesn't know your generating process.
The 2σ rule assumes independent, identically distributed samples. Metric series are neither: they're autocorrelated (the next minute looks like this one), seasonal (Tuesday 10:00 is not Sunday 03:00), and what pages you is a sustained crossing — an event with duration shaped by that autocorrelation — not a single point above a line. Under those conditions the textbook false-positive rate can be off by an order of magnitude, and always in the direction that wakes you up. Our moving-block bootstrap resamples blocks of your own series, so the dependence structure is preserved and the FP/year estimate is measured against your process, not a Gaussian fiction. The full methodology is public — check the math.
Pricing
Fixed price. Fixed scope. No taxi meter.
Audit Core
€1,500
one-time, fixed
- Up to 50 alert rules
- 1 Prometheus cluster
- The Proof report: current vs. calibrated threshold, FP/year with CI, per rule
- Rules to delete or merge, flagged
- JSON/YAML ready for Alertmanager
Audit Full
€2,900
one-time, fixed
- Up to 150 alert rules
- Multi-cluster
- Everything in Core
- Hand-off session with your team
- "What we did NOT touch and why" section — rules without statistical power are flagged, not guessed
PagerProof Server
from €250/mo
self-hosted · billed annually
- Team: €250/mo — 1 cluster, up to 150 rules
- Org: €400/mo — multi-cluster, unlimited rules, priority support
- Continuous recalibration as your traffic changes
- Runs in your infra (Docker), no data leaves your network
- No data lock-in: cancel and keep your last calibrated thresholds
Guarantee: if The Proof doesn't identify a projected reduction of ≥30% in false positives while maintaining detection, you get 100% of your money back.
And: the full audit price is credited toward your first PagerProof Server year if you subscribe within 90 days. The audit is the entry, not the exit.
FAQ
The objections, answered straight.
Why not just use Grafana ML / adaptive alerting?
Try it — then ask it what false-positive rate per year each rule will give you. It can't tell you. Those baselines are variants of mean+kσ and smoothing (Holt-Winters) applied to autocorrelated series: the independence assumption is broken from the start, so the real FP rate can be several times the theoretical one, and nobody guarantees it.
Our block bootstrap resamples blocks of your own series, preserving its autocorrelation, and calibrates under H0 of the healthy regime. That's why the FP/year we give you is a demonstrated number with an explicit condition, not a hope. And if after reading the methodology Grafana ML turns out to be enough for you — genuinely fine. Better that than static thresholds.
How do I know I won't miss real incidents when thresholds go up?
That's the right question, and the answer is in the report, not in our word. (1) Every threshold change in The Proof comes with a sensitivity analysis: we replay your real historical incidents against the new threshold and show you which would have fired, and with how much margin. (2) We calibrate under H0 of the healthy regime — the threshold is placed so that normal behavior doesn't cross it more than X times per year; real anomalies are, by definition, not the healthy regime.
Two honest caveats: recall can't be rigorously measured without labeled incidents, so where your history gives insufficient statistical power we mark the rule "do not touch — insufficient data" instead of guessing. And calibrated thresholds still require human review before you apply them. We're explicit about both — see the Limitations section.
Why trust a one-person company?
You don't have to — the model is designed so trust isn't required. (1) Self-hosted: Server runs in your infra and your metrics never leave your network; if PagerProof disappears tomorrow, you keep the calibrated thresholds and everything keeps working. (2) Public methodology: the method is published and auditable — verify the math yourself or pay any statistician to do it. The closed part is the engine that automates it, not the science. (3) Fixed price, no lock-in, money-back guarantee: your maximum exposure is known and small.
One person with an auditable method is less risky than an opaque vendor with 200 employees and a "trust our AI."
We already have Datadog / BigPanda — doesn't our platform do this?
They do something else. Correlation (BigPanda: −90/95% noise) groups alerts that already fired; Watchdog (−60% pager floods) filters with opaque anomaly detection. Both act after the badly calibrated threshold fires. We fix the origin: the rule firing correctly in the first place. The layers are compatible — in fact, calibrating the origin gives your correlation layer better raw material to work with.
We could build this in-house — we have good people.
You surely do — the question is opportunity cost. A block bootstrap under H0 done properly (block sizing, healthy-regime exclusion, multiple-rule correction) is 2–4 weeks of an engineer with a statistics background, plus maintenance. The audit costs less than one week of that same engineer, lands in 2–3 weeks, and the published methodology lets you verify it instead of rebuilding it. If you later want to internalize it, The Proof documents the path — no lock-in means that too.
€1,500 sounds expensive for a report.
The report is the format, not the product. The product is up to 27 hours/week of engineering recovered (reliamag/oxmaint) and an on-call rotation that stops burning people out. At a loaded European engineer cost (~€50/h), the audit pays for itself in under 3 weeks if it recovers just a third of that. And if the report doesn't demonstrate the improvement, you don't pay — that's the guarantee.
Know your FP/year. Per rule. With proof.
Tell us roughly how many alert rules you run and we'll reply with a fixed quote and a sample of The Proof. No call required unless you want one.
Prefer email? hello@pagerproof.com