The Changing North Star of AI Control

Inputs

Traffic

Total requests per day OpenAI publicly reports ~2.5B/day

% routed through agentic model Subset of total traffic

Scheming behavior

Scheming attack rate (%) 0.01% = 1 attempt per 10,000 requests

Monitor performance

Monitor TPR (%) Logit-ROC extrapolation at 10⁻⁵ step-FPR, slope 0.8 → ~1.1%

Monitor FPR (%) 10⁻⁵ = 0.001%, operationally realistic low-FPR point

Detection timeline

Days from first attack to model pulled Enterprise incident response typically 6–8 weeks

SOC cost model

Tokens per alert (agent trajectory) Context window to review

Analyst reading speed (wpm) Careful review rate

Trusted-monitor assist reduction (×) 0.01 = TM makes review 100× faster

Productive analyst hours per day After meetings, breaks, context switching

Analyst fully-loaded cost ($/yr) Salary + benefits + overhead

Manager fully-loaded cost ($/yr) One manager per N analysts

Analysts per manager Span of control

Damage model

Tokens per attack output Agent trajectory length

Chars per token Code tokenization ratio

Chars per LOC Avg line length

Forensic cost per LOC ($) Engineer time to review and classify

Affected customers Incident blast radius

Ransomware baseline per customer ($) IBM/Ponemon 2024 avg ~$4.5M

Rogue cloud investigation per customer ($) Tracing unauthorized infra

Results

Per-day breakdown

Agent requests/day

—

Attack attempts/day

—

Attacks caught/day

—

Attacks missed/day Successful undetected
—

False positives/day

—

Audit queue/day (flagged)

—

Queue true-positive rate Real attacks as % of queue

—

Cumulative by model-pull decision

Attack attempts before pull

—

Attacks caught before pull

—

Successful undetected attacks before pull
—

SOC cost (to operate detection)

Minutes per alert (manual)

—

Minutes per alert (TM-assisted)

—

Analysts needed (manual)

—

Analysts needed (TM-assisted)

—

SOC cost / year (manual)

—

SOC cost / year (TM-assisted)
—

Damage from undetected attacks

Lines of malicious code written

—

LOC forensic cleanup cost

—

Customer-incident damage

—

Rogue cloud investigation cost

—

Total estimated damage Single incident, assuming the model eventually does get pulled

—

By the time "pull the model" is executed, the forensic audit to untangle what was done is months-long and mixes malicious artifacts with legitimate developer work across customer environments the lab has no visibility into.

Damage / annual SOC cost ratio Higher = more cost-effective to invest in detection

—