Accountability

What happens when an AI call breaks.

Defendr turns measured failure into a clear loss report. Bring Your Own Keys gives visibility on your own provider bill; Managed can apply bounded service credits for eligible objective failures.

Read methodology

Two outcomes

Bring Your Own Keys

You keep provider accounts and provider bills. Defendr shows which billed calls failed and what they cost.

Managed

Defendr operates the provider relationship and can apply capped service credits for eligible objective failures.

Limits visible

Threshold based and review only signals stay in the report without pretending every quality issue is automatically creditable.

Modes Eligible failures Reported only Sample scorecard Frequently asked questions

Bring Your Own Keys versus Managed

Same measurement, different remedy path.

Question

Bring Your Own Keys

Managed

Provider account owner

Customer owned provider accounts and keys.

Defendr operated provider relationship as part of the service.

Who pays providers

The customer pays providers directly.

Defendr manages upstream provider payment as part of the Managed service.

What Defendr reports

Failed calls, estimated cost, evidence, and remedy status on the customer's provider bill.

The same loss report, plus service credit eligibility status for objective failures.

Do service credits apply?

No. The mode is visibility on your provider bill.

Yes, for eligible objective failures, bounded and capped by agreed terms.

What remains review only

Thresholded and review only signals are still visible.

Thresholded and review only signals need the agreed threshold, baseline, or review outcome before credit status changes.

What can be credited in Managed

Objective, evidenced failures.

Managed service credits are bounded and capped. Eligibility depends on objective evidence, not a broad quality complaint.

Downtime or timeout

Provider side unavailability, timeouts, transport failures, or capacity failures on eligible calls.

Broken output

Empty, malformed, truncated, or unusable output when the failure is visible from the response or usage record.

Invalid structured output

Invalid JSON, failed schema validation, or invalid tool call arguments when a schema or contract exists.

Billing or cache anomaly

Material mismatch in observable usage, cache, duplicate charge, or charge evidence.

What is reported first

Useful signals stay visible even when they are not automatic service credit rows.

Latency

Needs a configured threshold or service expectation.

Model drift

Needs an agreed baseline and threshold.

Refusals

Some are appropriate; benign prompt evidence matters.

Factuality

Needs ground truth or downstream verification.

Policy signals

Security and governance flags need review.

Sample loss report

A sanitized scorecard format.

Sample only. Not customer data, not a promise, and not a performance benchmark. Dollar figures below are realistic placeholders to show format, not production results.

Sample calls observed

184,200

Sample failed calls

1,124

Sample priced loss

$412.80

Sample Managed service credits

$188.40

Sample failure type	Events	Estimated cost	Evidence	Remedy status
Provider downtime or timeout	42	$27.90	Provider error, timeout, timing, charge evidence	Eligible for Managed service credit
Empty or truncated output	118	$36.40	Delivered content, finish reason, usage	Eligible for Managed service credit
Invalid structured output or tool call	76	$58.25	Schema contract, parse error, validation result	Eligible for Managed service credit
Billing or cache anomaly	9	$65.85	Usage fields, cache fields, observed charge	Eligible for Managed service credit
Latency above default review line	344	$124.90	Gateway timer and route metadata	Needs threshold
Refusal requiring review	212	$99.50	Refusal field, content filter signal, prompt class	Needs review
Model drift probe breach	3 workflows	Not priced	Approved baseline, probe result, threshold	Needs baseline review
Factuality flag without ground truth	18	Not priced	Weak signal only; no reference answer	Reported only

How to read the report

One artifact for finance, product, and engineering.

Finance

Sees cost exposure by failure type and the split between visibility only rows and eligible Managed service credit rows.

Product

Sees which workflows broke, which signals need thresholds, and where user experience is affected.

Engineering

Sees route, model, evidence, and next action for retries, failover, schema fixes, or threshold tuning.

Frequently asked questions

Accountability questions.

Does Bring Your Own Keys include service credits?: No. Bring Your Own Keys is visibility on the customer's own provider bill. The report shows failed calls, cost, evidence, and status.
Are Managed service credits unlimited?: No. Managed service credits are bounded and capped by agreed terms, and only eligible objective failures can receive that status.
Are all hallucinations credited?: No. Factuality requires ground truth. Without a reference answer, retrieval corpus, database result, human label, or downstream verifier, Defendr can only report weak signals.
Why keep limitations so prominent?: The limits are part of the product. They keep objective failures, thresholded signals, and review only flags from being mixed into one overbroad promise.