New

What makes AI auditable? See why regulated teams need explainable document AI that traces every output back to the source.

Highlights

Get Started

By Sam Gobrail, EVP, Solutions and Delivery, and Lucy Park, Co-founder & CPO, Upstage AI

Part 2 of a four-part series on what separates a vibe-coded document AI demo from a system an enterprise can run in production.

A health claim record goes into a document AI system on Monday. The system returns a decision, approved, with a clear, well-written summary of why. On Thursday, the same record goes in again. This time the system returns denied, with an equally clear, equally well-written summary of why.

Both explanations read perfectly. Only one can be right, and there is no way to tell which.

In a regulated enterprise, this is a serious problem. It points to the deeper issue that decides whether document AI can be deployed in high-stakes work at all: auditability.

Across insurance, healthcare, banking, and any field where decisions must withstand scrutiny, getting the answer right is only half of what the system has to do. The other half is being able to explain how it got there, and prove it later.

A confident AI explanation can still be invented

A frontier model can produce a fluent rationale for almost any output because the rationale is generated text, subject to the same variation and invention as the answer it explains. Two runs can yield two different answers and two convincing explanations. The explanation describes what the model produced, with no guarantee that it reflects how the value was actually derived or where it came from.

A source-grounded explanation connects the output back to the specific document, page, field, or evidence used to produce it.

In regulated work, an explanation that cannot be verified against the source document carries no evidentiary weight. It reads well and proves nothing. The Monday and Thursday problem is what this looks like in practice: a system that is articulate, inconsistent, and impossible to hold to account.

Auditability begins with explainability

Auditability is the ability to reconstruct, after the fact, how a result was produced, and to defend it to someone who was not there. That is impossible without explainability in the literal sense: every decision must be traceable, step by step, back to the source it came from.

A system that cannot show how a value was derived cannot be audited, however accurate it is on average. The two properties are directly linked. Explainability is the mechanism, and auditability is the outcome a regulated enterprise actually needs.

Why a raw model pipeline cannot be audited

A raw frontier model pipeline is a black box AI system: a document goes in and structured data comes out, with nothing in between that a person can inspect. When a value is wrong, there is no way to tell whether the parsing misread the page, the extraction picked the wrong field, or the classification sent the document down the wrong path.

There is no link from the output back to the place in the document it came from, and no record of who reviewed it, what they changed, or when it was approved. Knowing which step to trust, and by how much, is its own problem, which we cover in the piece on confidence.

When legal, compliance, or a regulator later asks how a particular figure entered the system of record, the honest answer is that no one can say. That gap is the deep, unglamorous part of the problem an internal build almost always underestimates.

Extraction is the visible work. The system that makes extraction auditable is the rabbit hole underneath it, and in regulated environments, it is the reason promising builds never reach production.

Regulation is catching up

Explainability and auditability are not just good ideas anymore. They are becoming legal and regulatory expectations.

In the last couple of years, two major regulatory frameworks have redrawn the lines for how insurers can use AI:

The NAIC Model Bulletin on AI Use (2024) makes it clear: insurers must maintain accountability, transparency, and explainability for any AI system influencing decisions. That means documenting data sources, reviewing for bias, and proving that no algorithm is introducing unfair outcomes.
The EU AI Act (2024) goes even further, identifying insurance underwriting and risk assessment as “high-risk” AI applications. Underwriters must maintain human oversight, auditable decision logs, and a transparent record of how every model behaves.

In simple terms, every automated action in underwriting, from data extraction to pricing recommendations, must now be explainable and defensible.

Regulators are saying what underwriters have known all along. When you can’t explain your decision, you lose credibility.

How Studio Makes Document AI Auditable

Upstage Studio is built around an explainable pipeline, where a document moves through distinct, visible stages: parse, extract, classify, and decision. Each stage produces a result that can be inspected on its own.

Every extracted value is grounded to its exact location in the source document, so a reviewer or an auditor can see the specific place on the page each value came from, alongside the conclusion the system drew from it. Every human action is part of the record: who reviewed an extraction, what they changed, and when it was approved. The full path, from the source document through parsing, extraction, human correction, and final approval, can be reconstructed and exported as an audit trail.

Because the pipeline is structured and recorded, the same document produces the same traceable result, and any change to it is visible. The Monday and Thursday problem stops being possible.

This is the system that sits underneath the extraction, and it is what a VP of underwriting or a head of claims is really asking about when they ask whether a tool is defensible. They may not run the technical evaluation themselves, but they know it has to satisfy the people who will: IT, legal, and auditors. An explainable, recorded pipeline is what lets those teams sign off.

In practice: The Tricura case study shows how the team automated application document extraction across inconsistent formats while holding accuracy above 95%, with classification and parsing handled regardless of how each document arrived.

The deployment gate for regulated AI

In high-stakes work, the deciding question is rarely whether a model can be accurate. It is whether a decision can be explained and defended months later, to someone who was not in the room.

A system that can trace a coverage limit from the source document through parsing, extraction, review, and approval can answer that question. A black box cannot.

Explainability turns an accurate model into an auditable system. Auditability is what makes AI deployable where the stakes are highest.

See how the explainable pipeline and audit trail work in Upstage Studio, or contact us to talk through the workflow on real documents.

The Black Box Problem: Why Auditable AI Has To Explain Every Decision

What makes AI auditable? See why regulated teams need explainable document AI that traces every output back to the source.

Upstage Team

•

Industry

•

June 16, 2026

By Sam Gobrail, EVP, Solutions and Delivery, and Lucy Park, Co-founder & CPO, Upstage AI

Part 2 of a four-part series on what separates a vibe-coded document AI demo from a system an enterprise can run in production.

Both explanations read perfectly. Only one can be right, and there is no way to tell which.

In a regulated enterprise, this is a serious problem. It points to the deeper issue that decides whether document AI can be deployed in high-stakes work at all: auditability.

A confident AI explanation can still be invented

A source-grounded explanation connects the output back to the specific document, page, field, or evidence used to produce it.

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Extraction is the visible work. The system that makes extraction auditable is the rabbit hole underneath it, and in regulated environments, it is the reason promising builds never reach production.

Regulation is catching up

Explainability and auditability are not just good ideas anymore. They are becoming legal and regulatory expectations.

In the last couple of years, two major regulatory frameworks have redrawn the lines for how insurers can use AI:

The NAIC Model Bulletin on AI Use (2024) makes it clear: insurers must maintain accountability, transparency, and explainability for any AI system influencing decisions. That means documenting data sources, reviewing for bias, and proving that no algorithm is introducing unfair outcomes.
The EU AI Act (2024) goes even further, identifying insurance underwriting and risk assessment as “high-risk” AI applications. Underwriters must maintain human oversight, auditable decision logs, and a transparent record of how every model behaves.

In simple terms, every automated action in underwriting, from data extraction to pricing recommendations, must now be explainable and defensible.

Regulators are saying what underwriters have known all along. When you can’t explain your decision, you lose credibility.

How Studio Makes Document AI Auditable

Because the pipeline is structured and recorded, the same document produces the same traceable result, and any change to it is visible. The Monday and Thursday problem stops being possible.

The deployment gate for regulated AI

In high-stakes work, the deciding question is rarely whether a model can be accurate. It is whether a decision can be explained and defended months later, to someone who was not in the room.

A system that can trace a coverage limit from the source document through parsing, extraction, review, and approval can answer that question. A black box cannot.

Explainability turns an accurate model into an auditable system. Auditability is what makes AI deployable where the stakes are highest.

See how the explainable pipeline and audit trail work in Upstage Studio, or contact us to talk through the workflow on real documents.

More in this series

By Sam Gobrail, EVP, Solutions and Delivery, and Lucy Park, Co-founder & CPO, Upstage AI

Part 2 of a four-part series on what separates a vibe-coded document AI demo from a system an enterprise can run in production.

Both explanations read perfectly. Only one can be right, and there is no way to tell which.

In a regulated enterprise, this is a serious problem. It points to the deeper issue that decides whether document AI can be deployed in high-stakes work at all: auditability.

A confident AI explanation can still be invented

A source-grounded explanation connects the output back to the specific document, page, field, or evidence used to produce it.

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Extraction is the visible work. The system that makes extraction auditable is the rabbit hole underneath it, and in regulated environments, it is the reason promising builds never reach production.

Regulation is catching up

Explainability and auditability are not just good ideas anymore. They are becoming legal and regulatory expectations.

In the last couple of years, two major regulatory frameworks have redrawn the lines for how insurers can use AI:

The NAIC Model Bulletin on AI Use (2024) makes it clear: insurers must maintain accountability, transparency, and explainability for any AI system influencing decisions. That means documenting data sources, reviewing for bias, and proving that no algorithm is introducing unfair outcomes.
The EU AI Act (2024) goes even further, identifying insurance underwriting and risk assessment as “high-risk” AI applications. Underwriters must maintain human oversight, auditable decision logs, and a transparent record of how every model behaves.

In simple terms, every automated action in underwriting, from data extraction to pricing recommendations, must now be explainable and defensible.

Regulators are saying what underwriters have known all along. When you can’t explain your decision, you lose credibility.

How Studio Makes Document AI Auditable

Because the pipeline is structured and recorded, the same document produces the same traceable result, and any change to it is visible. The Monday and Thursday problem stops being possible.

The deployment gate for regulated AI

In high-stakes work, the deciding question is rarely whether a model can be accurate. It is whether a decision can be explained and defended months later, to someone who was not in the room.

A system that can trace a coverage limit from the source document through parsing, extraction, review, and approval can answer that question. A black box cannot.

Explainability turns an accurate model into an auditable system. Auditability is what makes AI deployable where the stakes are highest.

See how the explainable pipeline and audit trail work in Upstage Studio, or contact us to talk through the workflow on real documents.

Frequently Asked Questions

Why do in-house document AI projects stall before reaching production?

In-house document AI projects typically reach a working pilot, then lose momentum when real-world document variety exposes the limits of a tuned prompt. A tuned prompt climbs to roughly 60-70% accuracy fast, then stalls around 75% because production is the long tail. New layouts, scanned pages, handwritten fields, and multi-page tables arrive after launch and expose every assumption baked into the original prompt or parsing logic. Internal teams then spend engineering cycles chasing individual failures instead of improving the system as a whole.

Upstage Studio addresses this with a production-ready agent library covering invoice extraction, loss run extraction, underwriting submission review, and claims document handling, so teams start from working templates rather than a blank prompt. Quick Tune lets a domain expert define outcomes and generate a working schema in minutes, with no prompt engineering required. Once in production, Studio measures accuracy at each pipeline stage and uses low-confidence signals to auto-tune the extraction schema as documents drift. Studio processes over 3 million pages daily across more than 100 enterprise customers.

Why isn't a frontier LLM alone sufficient for production document AI workflows?

A frontier LLM can generate a plausible extraction, but it lacks the structured confidence signaling and source grounding that production document workflows require. Generic LLMs are fast but unstable for document processing — a serious liability in regulated workflows where an incorrect value can affect a claim, policy, or financial record.

Production document AI requires three things a standalone LLM does not provide: a schema-driven extraction layer that handles checkboxes, page-break tables, and rotated pages; a per-step confidence score that routes uncertain outputs to human review; and a source-grounded audit trail that ties every value to its exact location in the original file. Upstage's Information Extract is schema-free and high-precision, operating without templates or retraining while handling PDFs, scanned images, Office files, and documents exceeding 500 pages. As Sam Gobrail, EVP of Solutions and Delivery at Upstage, puts it: "A model has to know what it does not know, and say so." Best Option replaced a multi-tool stack with a single Upstage API and reached 95%+ entity extraction with document-to-data time under 60 seconds.

What regulations require explainable AI in insurance, and what do they actually demand?

Both the NAIC Model Bulletin adopted in December 2023 and the EU AI Act of 2024 impose specific obligations on insurers using AI systems, centering on documentation, human oversight, and the ability to explain decisions. These requirements make source-grounded, auditable AI infrastructure a compliance necessity rather than a technical preference.

The NAIC Model Bulletin directs insurers to maintain a written AI Systems program, ensure AI-driven decisions are accurate and not unfairly discriminatory, and keep documentation regulators may request at any time. By April 30, 2024, 10 states had adopted related actions. The EU AI Act classifies insurance underwriting and claims scoring as high-risk AI applications, requiring logging, accuracy validation, human oversight, and transparency measures — with transparency rules taking effect in August 2026. Upstage Studio addresses these demands through exportable audit trails, role-based access controls, and data retention settings, all backed by SOC 2 Type 1, ISO 27001/27701, and HIPAA certifications.

How does Upstage Studio handle document formats that vary across carriers or vendors?

Upstage Studio supports PDFs, scanned images, spreadsheets, slides, Office files, emails, rotated pages, handwritten elements, and documents exceeding 500 pages — all without requiring templates or retraining for each new format. Document Parse achieves a TEDS score of 93.48 and TEDS-S of 94.16 on table extraction benchmarks, processing 100 pages in under one minute at 0.6 seconds per page.

Amwins processed invoices across 80+ carrier formats, handling over 200 documents daily and 1,100+ in month one alone, without rebuilding extraction logic for each carrier's layout. Information Extract's schema-agnostic design means the extraction schema adapts to the document rather than requiring the document to conform to a predefined template. As Lucy Park, Co-founder and CPO of Upstage, explains: "Explainability turns an accurate model into an auditable system." When formats shift over time, Studio's automated correction loop updates the extraction schema without engineering intervention.

How does Upstage Studio route uncertain extractions to human review without overwhelming reviewers?

Studio attaches a confidence score to each stage of the extraction pipeline and surfaces only low-confidence outputs for human review, so reviewers focus their attention on the small share of outputs that genuinely need it. Per-step confidence scores are assigned at the parse, extract, and classify stages — not as a single pass/fail score on the entire document. High-confidence outputs flow through without interruption; low-confidence outputs are flagged for the review queue.

Tricura Insurance Group achieved over 95% accuracy with a review time under one minute per document using this model. The same low-confidence signals feed Studio's auto-tuning loop: corrections made by reviewers inform schema updates, so the system improves from human oversight. Every edit and approval is recorded in Studio's audit trail alongside the original extracted value and its source location — making human oversight a designed part of the pipeline that improves accuracy over time and satisfies documentation requirements in regulated industries.

Highlights

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

The Black Box Problem: Why Auditable AI Has To Explain Every Decision

We build intelligence for the future of work—now it’s your turn.

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

Frequently Asked Questions

Why do in-house document AI projects stall before reaching production?

Why isn't a frontier LLM alone sufficient for production document AI workflows?

What regulations require explainable AI in insurance, and what do they actually demand?

How does Upstage Studio handle document formats that vary across carriers or vendors?

How does Upstage Studio route uncertain extractions to human review without overwhelming reviewers?

The 90-Day path to Underwriting Reinvention

Download the White Paper

A look back on 2023 AI trend keywords

A look back on 2023 AI trend keywords

2023 Retrospective

2023 Retrospective

Explore on-device AI: AI without internet and cloud

Explore on-device AI: AI without internet and cloud

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

Related posts

We build intelligence for the future of work—now it’s your turn.

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

A confident AI explanation can still be invented

Auditability begins with explainability

Why a raw model pipeline cannot be audited

Regulation is catching up

How Studio Makes Document AI Auditable

The deployment gate for regulated AI

More in this series

Frequently Asked Questions

Why do in-house document AI projects stall before reaching production?

Why isn't a frontier LLM alone sufficient for production document AI workflows?

What regulations require explainable AI in insurance, and what do they actually demand?

How does Upstage Studio handle document formats that vary across carriers or vendors?

How does Upstage Studio route uncertain extractions to human review without overwhelming reviewers?

The 90-Day path to Underwriting Reinvention

Download the White Paper

Related blog posts

A look back on 2023 AI trend keywords

A look back on 2023 AI trend keywords

2023 Retrospective

2023 Retrospective

Explore on-device AI: AI without internet and cloud

Explore on-device AI: AI without internet and cloud