Unit 08 of 10

Unit 8: AI product metrics: the four-layer framework

Learning objectives

Apply the four-layer AI metrics framework (model quality, user trust, task completion, business impact). Design metric dashboards specific to AI features. Avoid common AI measurement pitfalls.

Video script

Reading material

Setting up the dashboard

For each AI feature, create a dashboard with one or two metrics per layer.

Layer 1 example: accuracy by category, precision/recall for the most important categories, error rate trend.

Layer 2 example: weekly override rate, average review time before acceptance, adoption percentage of eligible users.

Layer 3 example: time-to-completion for tasks using AI vs. tasks without, user-reported satisfaction with task outcome.

Layer 4 example: the business metric most directly connected to the AI feature (retention, conversion, efficiency, etc.).

Review layers 1-2 weekly. Review layers 3-4 monthly. Investigate when any metric moves significantly in either direction.

Metrics pitfalls specific to AI

Aggregate accuracy hiding segmented failure. Overall 94% accuracy can mask 78% accuracy in the categories that matter most. Always break metrics down by segment.

Measuring usage frequency as success. High usage of an AI feature might mean it works well, or it might mean users have to retry multiple times because the output isn't good enough. Pair usage metrics with quality and trust metrics.

Ignoring the counterfactual. The right comparison isn't "how does the AI perform?" but "how does the workflow perform with AI versus without AI?" Sometimes the AI makes things worse even if its individual outputs are decent, because it introduces a new step in the workflow that costs more time than it saves.

Not tracking trust over time. A one-time trust measurement is nearly useless. Trust is dynamic. Track it longitudinally and look for trends. A slowly declining trust curve is a warning that model degradation or edge cases are accumulating.

Practical exercise

Exercise: Design an AI metrics dashboard

Choose an AI feature (real or hypothetical). Design a metrics dashboard using the four-layer framework.

For each layer, specify: the metric name, how it's measured, the current baseline (estimated if needed), the target, and the review frequency.

Then describe one scenario where the layers tell conflicting stories (e.g., high accuracy but declining trust) and what you'd investigate to understand why.

Write this up as a metrics plan document.