Methodology

Every metric on this site is computed the same way for Plumb and for the market, so the comparison is apples to apples. Here is exactly how.

What counts as a call

For each declared-universe market, Plumb records a fair-value estimate (or an explicit abstain) and commits it on-chain before the market's scheduled start. A call is graded only once the market resolves on UMA. Abstains, voids, non-scoring (committed after start) and still-pending calls are all kept on the record and labeled — never dropped.

Brier score

Brier = (p − outcome)², where p is the forecast probability of YES and outcome is 1 (YES) or 0 (NO). Lower is better. We report Plumb's Brier beside the market's Brier on the contemporaneous price, so you can see whether the forecast was better calibrated than the market it was made against.

Log-loss

Log-loss = −[outcome·ln(p) + (1−outcome)·ln(1−p)], clamped off 0 and 1 so a confident-wrong call is a large but finite penalty. Lower is better. Again shown for Plumb and the market side by side.

Closing-line value (CLV)

CLV measures how the market's line moved after Plumb committed, signed in the leaned direction (YES: closing − commit; NO: commit − closing). Positive CLV means the market drifted toward Plumb's view — a sign the call was early rather than late. CLV is a Plumb-only diagnostic; there is no market baseline for it.

Calibration

Predictions are bucketed by forecast probability; for each bucket we plot the mean predicted probability against the empirical fraction that resolved YES. A perfectly calibrated forecaster sits on the 45° line. The curve has an accessible data table beneath it.

The market baseline

The baseline is the market's own YES price at the moment Plumb committed. Beating it on Brier and log-loss is the honest bar: it asks whether Plumb added information beyond what the market already priced in.

Honesty & limits

Plumb v1 does not trade and is not sold. It is a free, informational accuracy record. "Evidence-backed" is honestly partial in v1 — the engine reasons over LLM analysis plus smart-money and news priors; the deeper OSINT evidence trail arrives with Q1 Scout. "Powered by the τ engine" is a statement of provenance, not a performance claim. The full record fully closes its external-verifiability loop once the first batch is anchored on Polygon (see Receipts).