Making AI-edited assessments auditable

When an AI rewrites a compliance question, who is accountable? A question-level audit trail with a floating modal, designed around how auditors actually ask their questions.

Context

ServiceNow · Smart Assessment Engine (SAE) · Shipped

Role

Product designer · IA, interaction, visual

¶ Premise

“The hardest part of AI-assisted compliance is not the AI. It is leaving a trail clear enough that an external auditor can reconstruct what the AI did and why a human accepted it.”

§ 01 Premise

Authorship gets murky when AI helps

Smart Assessment Engine lets risk teams generate, edit, and tailor assessment questionnaires with AI assistance. That capability creates a problem most AI features quietly avoid: when something goes wrong six months later, an auditor needs to know which words were AI-generated, which were human-edited, who approved them, and when.

The product already had an audit log, but it sat at the wrong level of granularity. A 200-question questionnaire could change in dozens of small ways and the log would tell you only that 'something' had changed, somewhere inside it.

§ 02 The decision

Anchor the log at the question, not the section

The default proposal was section-level audit history — one log per group of related questions. It was the natural compromise between assessment-level (too coarse) and the engineering cost of going finer. I pushed back and argued for question-level instead.

The argument that carried was about how auditors actually work. An external auditor does not show up and ask 'walk me through the changes to your vendor security section.' They ask specific questions about specific questions: this control, this wording, this version, this approver. The audit log has to match the shape of the question being asked of it. Anything coarser turns the log into a starting point for a search, not an answer.

¶ Field note

Auditors do not want completeness. They want the smallest possible answer to a specific question.

§ 03 Form

A floating modal pinned to one question at a time

Once the unit of accountability was settled, the rest of the design followed from it. The audit trail surfaces as a floating modal anchored to a single question, holding three things in tight composition: avatar stacks for everyone who touched the question, word-level diffs of every change, and a CSV export sized to satisfy the most common external audit ask.

The decision to anchor at the question level made the modal feel almost embarrassingly specific until you used it. Then specificity was the entire point.

§ 04 Detail

Word-level diffs as a trust gesture

AI suggestions are rendered with the same diff treatment as human edits. No special highlighting, no apologetic 'AI' badge. The provenance is shown in the avatar stack, factually. The implicit message: AI-generated content is held to the same evidentiary standard as anything else in the system. We do not hide it. We do not flatter it.

§ 05 Outcome

The pattern shipped. Internally, it has become reference material for other AI-assisted surfaces in the GRC suite that face the same provenance question. The qualitative validation has been the most useful kind so far — auditors and risk leads who use the modal and immediately understand what it is for, without explanation. That is the test I cared about.

§ Outcomes

01Shipped feature in production
02Question-level anchoring chosen over the default section-level proposal
03Now reference material for AI provenance across other GRC surfaces

Making AI-edited assessments auditable

Authorship gets murky when AI helps

Anchor the log at the question, not the section

A floating modal pinned to one question at a time

Word-level diffs as a trust gesture

An AI-native design system as a Claude skill →