What FDA's AI Guidance Really Demands

Apr 3
12 min read

A deep-dive into the credibility assessment framework, GxP intersection, and the organizational steps life sciences companies must take — now.

On January 6, 2025, the U.S. Food and Drug Administration (FDA) released its first-ever draft guidance on AI for drug and biological products. Simultaneously, it issued parallel guidance for AI-enabled medical devices. Taken together with the 2025 Exer Labs warning letter and the October 2024 European Medicines Agency (EMA) Reflection Paper, these documents signal a decisive shift: AI in life sciences has moved from tolerated experiment to formally regulated activity.

This article goes beyond the high-level compliance checklists circulating in the industry. We unpack the FDA's seven-step credibility assessment framework, examine how traditional Good Practice (GxP) validation principles strain under AI's adaptive behavior, and set out what a genuinely AI-ready governance model looks like — one that satisfies the FDA, the EMA, and the incoming EU AI Act simultaneously.

01 — Context: From Guidance to Enforcement: The Era Has Changed

For most of the last decade, AI in pharmaceutical and medical device contexts existed in a comfortable regulatory grey zone. The FDA had issued discussion papers, hosted workshops, and reviewed hundreds of submissions containing AI components — but formal, binding guidance was largely absent. Companies ran AI-powered deviation detection, predictive quality analytics, and clinical trial optimization tools with minimal regulatory oversight, often reasoning that internal "non-GxP" tools sat outside the compliance perimeter.

That reasoning is now demonstrably wrong.

When AI informs a regulated decision — on labelling, dosing, safety, batch release, or quality — the entire system is subject to device-level quality and lifecycle controls.

The FDA warning letter to Exer Labs made this concrete. The company deployed an AI motion-analysis system for musculoskeletal assessment without 510(k) clearance and with material gaps in its quality management system: no design controls, absent CAPA procedures, insufficient audit trails, unqualified suppliers, and training deficiencies. The FDA's message was unambiguous. AI-enabled functionality with diagnostic intent is a medical device — and must be governed as one, regardless of how the company chose to categorize it internally. The Exer Labs case establishes a precedent that extends well beyond medical devices. Any AI system that influences GxP decisions — manufacturing parameters, batch release, pharmacovigilance signal detection, labelling claims — is now exposed to the same scrutiny. Internal classification as a "productivity tool" is not a regulatory defense.

Anatomy of the FDA's Seven-Step Credibility Assessment Framework

The January 2025 draft guidance (FDA-2024-D-4689) introduces a structured, risk-proportionate methodology for evaluating AI models that produce data or information intended to support regulatory submissions. The framework was informed by an FDA-sponsored expert workshop at Duke's Margolis Institute (December 2022), more than 800 stakeholder comments on two 2023 discussion papers, and the agency's own experience reviewing more than 500 AI-containing drug and biologics submissions since 2016.

The core concept is credibility — not just validation. Credibility, in the FDA's definition, is trust established through the systematic collection of evidence that an AI model performs adequately for a specific, documented purpose. This is an important distinction: it shifts the burden from proving a system works generically to proving it works for the exact context in which it will be used.

The Seven Steps in Practice

What regulatory question is the AI model intended to answer? This must be stated with specificity — e.g., "predict which patients in a Phase III oncology trial are at elevated risk of Grade 3+ cardiotoxicity, to determine monitoring protocol." Vague scope is a disqualifying gap.

Define the Context of Use (COU). The COU specifies the precise role and scope of the model within a development program. It defines the patient population, data inputs, operating environment, and how model outputs will be used in decision-making. Everything downstream — validation scope, bias assessment, monitoring requirements — is anchored here. A change in COU effectively requires a new credibility assessment.
Assess Model Risk. Risk is evaluated on two axes: model influence (how directly does the model output affect a regulatory decision?) and decision consequence (how severe are the downstream effects of an incorrect output?). A model that autonomously stratifies patients for inpatient versus outpatient monitoring of a life-threatening adverse event sits at the maximum risk tier; a model that flags potential transcription errors in a non-critical data field does not. The FDA provides worked hypothetical examples in the draft to illustrate this tiering.
Develop a Credibility Assessment Plan. The plan documents how credibility evidence will be collected, organized, and evaluated — proportionate to the risk tier established in Step 3. Higher-risk models demand broader training-data documentation, more rigorous out-of-distribution testing, prospective performance benchmarks, and detailed uncertainty quantification. The plan should be developed before execution, not retrospectively.
Execute the Plan. Includes model development, testing, and validation activities — covering training/test set construction and independence, feature engineering rationale, performance metrics appropriate to the COU, bias and fairness assessments, sensitivity analyses, and benchmarking against alternative methods.
Document Results and Deviations. Full documentation of outcomes, including any deviations from the plan and how they were managed. This is the audit trail that a regulator will examine. The guidance explicitly calls out the need for reproducibility — results must be replicable given the same inputs and model version.
Evaluate Model Adequacy for the COU. A holistic judgement — supported by the accumulated evidence — that the model is fit for the specific regulatory purpose for which it is being deployed. This evaluation should address known limitations and propose mitigations. The adequacy finding must be formally documented and signed off.

GxP Principles Under Strain: Where Traditional Compliance Meets AI's Adaptive Nature

GxP has long required that all data be Attributable, Legible, Contemporaneous, Original, and Accurate — expanded to ALCOA+ with Complete, Consistent, Enduring, and Available. These principles were designed for human-generated records and static computerized systems. AI introduces a set of challenges that existing ALCOA+ interpretations were never designed to handle:

Attributability:

Who "authored" an AI-generated record? The model, the data scientist who built it, the vendor who supplied it, or the QA team that validated it? Regulatory frameworks require a named, accountable human — a gap that must be addressed in governance design.

Contemporaneity:

AI models often generate outputs in real-time from historical training data. When exactly was a prediction "made"? Timestamp strategies for AI outputs need explicit design.

Originality:

AI outputs are derived, not original — they are transformations of training data. Preserving data lineage from raw training sources through to final model output is a new category of documentation requirement.

Accuracy over time:

Unlike static software, AI accuracy degrades as the operational environment drifts away from the training distribution. A model accurate at deployment may be materially inaccurate two years later. Static validation does not capture this.

GAMP 5 Second Edition and Appendix D11: The Emerging Standard

ISPE's GAMP 5 Second Edition (2022) introduced significant updates to accommodate data-driven systems, and its Appendix D11 — dedicated to AI/ML systems — is the industry's most detailed available framework for GxP-compliant AI validation. The key shift from traditional CSV: validation is no longer a one-time event. Model performance is monitored continuously. Data governance, explainability, and drift detection are formal requirements, not optional enhancements. The ISPE D/A/CH working group's AI Maturity Model adds a further dimension, defining AI maturity on two axes — degree of autonomy and robustness of control design — with the principle that higher autonomy demands correspondingly higher control rigor.

GAMP 5 2nd Edition formally endorses Computer Software Assurance (CSA) over traditional scripted testing for many AI applications. CSA is risk-based, documentation-light, and focuses on evidence of fitness for purpose rather than line-by-line test execution. For adaptive AI systems where behavior changes between deployments, this is the only tenable approach — but it demands a far more sophisticated risk taxonomy upfront.

Predetermined Change Control Plans: The PCCP Analogy

For medical devices, the FDA has established the concept of the Predetermined Change Control Plan (PCCP) — a pre-approved roadmap for anticipated model updates that allows manufacturers to retrain and redeploy models within defined parameters without triggering full re-submission. The 2025 draft guidance for drugs and biologics introduces an analogous concept: a lifecycle management plan that must be maintained as part of the pharmaceutical quality system and summarized in marketing applications.

This is a significant practical development. Companies that invest in building a rigorous, well-documented PCCP-equivalent now will have a materially faster regulatory pathway when models are updated — which, for production AI systems, will happen repeatedly.

04 — The Black Box Problem: Explainability as a Compliance Requirement

GxP's foundational audit principle — that any decision affecting product quality or patient safety must be explainable and defensible to an inspector — collides head-on with the opacity of modern machine learning architectures. Deep neural networks, gradient boosting ensembles, and large language models can produce outputs that no human can fully trace, even in principle.

The regulatory response is not to ban such models, but to mandate explainability as a proportionate overlay on top of black-box systems — what the pharmaceutical engineering community has called explainable AI (xAI) in GxP contexts.

"A black-box AI that cannot show its reasoning cannot be audited. A system that cannot be audited cannot be trusted in a GxP environment."

What this means practically:

Feature importance documentation: For models used in high-consequence decisions, SHAP values, LIME explanations, or equivalent post-hoc attribution methods must be documented and preserved alongside model outputs.

Decision audit trails: The inputs that generated a specific output, and the model version at the time of that output, must be immutably recorded. This is a data engineering requirement, not just a QA policy.

Human-AI teaming architecture: The FDA's guidance references the concept of a "Human-AI Team" — a model where AI augments rather than replaces qualified human judgement, with clear escalation paths when model confidence is low or output is out of distribution. This must be designed into the system, not bolted on.

Uncertainty quantification: The 2025 draft guidance specifically references Bayesian approaches as a positive example for communicating prediction uncertainty. Models that output only point predictions without confidence bounds are at a compliance disadvantage.

05 — Global Alignment: Beyond FDA — EMA, the EU AI Act, and the Convergence You Cannot Ignore

EMA Reflection Paper on AI (October 2024)

The European Medicines Agency published its Reflection Paper on the Use of Artificial Intelligence in the Medicinal Product Lifecycle in October 2024. The paper reinforces data integrity, traceability, and human oversight expectations — broadly consistent with the FDA's framework, but with some distinctive emphases:

Stronger emphasis on diversity and representativeness of training data, particularly for models applied to EU patient populations that may differ demographically from US training cohorts.
Explicit expectations for model cards — standardised documentation of model purpose, performance characteristics, known limitations, and appropriate use cases — as a condition of regulatory acceptance.
Requirements for ongoing post-market surveillance of AI-generated signals in pharmacovigilance, with defined thresholds that trigger formal model review.

EU AI Act: The Intersection with GxP

The EU AI Act, entering full application in 2026, classifies AI systems used in the management or operation of critical health infrastructure, and AI systems used as safety components of medical devices, as high-risk. High-risk AI systems face a mandatory pre-market conformity assessment, ongoing post-market monitoring, mandatory human oversight mechanisms, and requirements for technical documentation — obligations that map closely onto existing GxP requirements but add a parallel, legally binding layer.

Strategic Risk: Companies that treat FDA compliance and EU AI Act compliance as separate programs will face duplicative overhead and potential inconsistencies. The organisations that build a unified AI governance model — one that satisfies both frameworks simultaneously — will have a significant structural advantage.

The ICH Framework Connections

Two International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) guidelines deserve explicit attention in any AI-in-GxP governance design:

ICH Q9(R1) — Quality Risk Management: The revised guideline (2023) explicitly addresses the risk of over-reliance on algorithmic decision-making without adequate understanding of model uncertainty. AI risk assessments must now explicitly account for model failure modes, not just process failure modes.

ICH E6(R3) — Good Clinical Practice: The updated GCP guideline introduces specific provisions for data generated by adaptive algorithms in clinical trials — including requirements for algorithm transparency in the protocol and statistical analysis plan, and predefined criteria for algorithm retraining or suspension during an ongoing trial.

06 — Lifecycle: Why AI Is Never "Validated Once"

Perhaps the most operationally significant shift in the 2025 guidance is the explicit rejection of one-time validation as sufficient for AI systems. The FDA expects continuous performance monitoring across the entire operational lifecycle — a requirement that demands new technical infrastructure and new organizational capabilities.

Model Drift vs. Model Decay: An Important Distinction

Model drift refers to changes in the statistical relationship between inputs and outputs over time, driven by changes in the real-world data distribution. A model trained on pre-COVID clinical data may perform differently on post-COVID patient populations, not because the model changed, but because the world did. Drift monitoring is a statistical engineering problem.

Model decay refers to the degradation of a model's predictive performance relative to a defined benchmark, regardless of the cause. Decay triggers the formal question of whether the model remains fit for its documented COU — and if not, whether retraining, replacement, or suspension of the regulated application is required.

Both require formal monitoring programmes, defined performance thresholds, and change control workflows. These are not IT projects — they require QA ownership and regulatory sign-off.

The NIST AI Risk Management Framework as a Practical Tool

The NIST AI RMF (2023) — while not a regulatory requirement — provides a well-structured vocabulary and process architecture for AI lifecycle governance that maps neatly onto both FDA expectations and GxP principles. Its four functions — Govern, Map, Measure, Manage — translate directly into the organizational capabilities required for compliant AI operations, and are increasingly referenced in FDA engagement conversations as credible evidence of systematic AI governance.

07 — Action: A 12-Month Compliance Roadmap

Foundation (0–90 days) Complete AI system inventory. Classify all systems by risk tier (high / medium / low) using a documented, defensible methodology. Assign QA ownership to each high-risk system. Identify vendor-supplied AI components requiring qualification.

Governance (30–120 days) Establish an AI Governance Board with cross-functional membership (QA, Regulatory, IT, Data Science, Legal). Draft Responsible AI Policy. Create an AI Risk Register. Define escalation and suspension criteria for regulated AI systems.

Validation (60–180 days) Develop AI-specific validation SOPs aligned to GAMP 5 Appendix D11 and CSA principles. Retroactively assess validation gaps for currently deployed AI in GxP contexts. Prioritize high-risk systems for credibility assessment plan development.

Data Integrity (60–180 days) Map ALCOA+ requirements onto all AI data pipelines. Implement immutable audit trails for AI inputs and outputs. Establish data lineage documentation from training sources to production model.

Monitoring (120–270 days) Deploy model performance monitoring for all high-risk AI systems. Define drift and decay thresholds. Integrate monitoring alerts into CAPA workflow. Build lifecycle management plan templates aligned to the FDA's 2025 guidance.

Vendor Oversight (90–270 days) Audit key AI vendors against a defined qualification questionnaire covering model documentation, change control, bias assessments, and SLA commitments. Update supplier agreements to include AI-specific quality provisions.

EU Readiness (180–365 days) Map EU AI Act high-risk classifications against existing AI inventory. Identify conformity assessment gaps. Harmonize FDA and EMA documentation requirements into a single unified framework to avoid parallel documentation overhead.

08 — Perspective: The Strategic Opportunity Inside the Compliance Burden

It would be easy — and wrong — to read this regulatory shift purely as a compliance burden. The companies that build genuinely robust AI governance models are building something more valuable than audit-readiness: they are building the organizational infrastructure for trustworthy AI at scale.

The credibility assessment framework, applied rigorously, forces the kind of disciplined thinking about model purpose, performance, and failure modes that distinguishes AI systems that actually improve patient outcomes from those that merely look impressive in a pilot. The lifecycle monitoring requirements create the feedback loops necessary to catch model failure before it becomes a patient safety event. The explainability requirements drive the human-AI collaboration architectures that make AI genuinely useful in complex, high-stakes regulatory environments.

Done well, GxP-compliant AI is not slower AI or worse AI. It is AI that can be trusted — by regulators, by clinicians, by patients, and by the organizations that deploy it.

Is your AI programme ready for FDA and EMA scrutiny?

Alignmt AI helps life sciences organizations design GxP-compliant AI governance frameworks — from risk classification and credibility assessment planning to lifecycle monitoring and vendor qualification. Book a demo

Key References

FDA — Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products, Draft Guidance, January 6, 2025. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-use-artificial-intelligence-support-regulatory-decision-making-drug-and-biological
FDA — Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations, Draft Guidance, January 7, 2025. https://www.federalregister.gov/documents/2025/01/07/2024-31543/artificial-intelligence-enabled-device-software-functions-lifecycle-management-and-marketing
FDA — Warning Letter to Exer Labs, Inc. (CMS 699218), February 10, 2025. https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/exer-labs-inc-699218-02102025
EMA — Reflection Paper on the Use of Artificial Intelligence in the Medicinal Product Lifecycle, October 2024. https://www.ema.europa.eu/en/documents/scientific-guideline/reflection-paper-use-artificial-intelligence-ai-medicinal-product-lifecycle_en.pdf
EU — Regulation (EU) 2024/1689 — Artificial Intelligence Act, Official Journal of the European Union, 2024. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689
ISPE — GAMP 5: A Risk-Based Approach to Compliant GxP Computerized Systems, Second Edition (including Appendix D11 on AI/ML), 2022. https://ispe.org/publications/guidance-documents/gamp-5-second-edition
NIST — Artificial Intelligence Risk Management Framework (AI RMF 1.0), January 2023. https://www.nist.gov/artificial-intelligence/ai-risk-management-framework
ICH — Q9(R1): Quality Risk Management, adopted January 18, 2023. https://database.ich.org/sites/default/files/ICH_Q9(R1)_Guideline_Step4_2022_1219.pdf
ICH — E6(R3): Guideline for Good Clinical Practice, 2023. https://database.ich.org/sites/default/files/ICH_E6-R3_Guideline_2023_Step4.pdf
ISPE — Erdmann, N. et al., "AI Maturity Model for GxP Application: A Foundation for AI Validation," Pharmaceutical Engineering, March–April 2022. https://ispe.org/pharmaceutical-engineering/march-april-2022/ai-maturity-model-gxp-application-foundation-ai
ISPE — Subramanian, T. & Henrichmann, F., "Artificial Intelligence Governance in GxP Environments," Pharmaceutical Engineering, July–August 2024. https://ispe.org/pharmaceutical-engineering/july-august-2024/artificial-intelligence-governance-gxp-environments
ISPE — Subramanian, T. & Henrichmann, F., "The Road to Explainable AI in GxP-Regulated Areas," Pharmaceutical Engineering, January–February 2023. https://ispe.org/pharmaceutical-engineering/january-february-2023/road-explainable-ai-gxp-regulated-areas
FDA — Artificial Intelligence and Medical Products: How CBER, CDER, CDRH, and OCP are Working Together, March 2024. https://www.fda.gov/media/167973/download
MHRA — GxP Data Integrity Guidance and Definitions, UK Medicines and Healthcare products Regulatory Agency. https://www.gov.uk/government/publications/regulatory-guidance-gxp-data-integrity-definitions-and-guidance-for-industry
WHO — Ethics and Governance of Artificial Intelligence for Health, World Health Organization, 2021. https://www.who.int/publications/i/item/9789240029200