GxP Evidence Architecture

GMP is a time, context, and decision problem – not a documentation problem.

Cloud Can trigger, never decide.
Line Decides fast. Irreversible.
Site Decides formally. Approval required.
Audit Built from Snapshots, not Logs.
🔍 Perspective
🎯 Decision Layer
Active View
Select a perspective
All levels
🎯 Decision Authority
    📋 Required Evidence
      ⚡ Critical Controls

        Choose Your Perspective

        Select a Perspective to explore different architectural views, then use Decision Layer to filter by system level. Start with Scenario for a concrete OEE-to-Gate walkthrough.

        5 Perspectives
        7 Decision Steps
        8 Decision Objects
        3 Audit Layers

        The GMP Gradient

        GMP requirements scale with proximity to the process. The closer to the physical process, the higher the time resolution and the stricter the requirements.

        GMP is not binary.

        It scales with proximity. Time resolution increases toward the machine. Causality becomes critical.

        ⚙️ Line (OT): Real-Time Truth

        Decisions are irreversible. When a PLC stops a line or marks a product NOK, there's no "undo". Event time and causal ordering are non-negotiable. Operator has authority. Cloud may never close loops directly.

        🏭 Site (MES): Batch Orchestration

        Context completeness is mandatory. Every event must be correlated to its batch. Recipe changes require approval. QA has final authority. Deviations trigger workflows, not just logs.

        ☁️ Cloud: Decision Support, Not Authority

        Cloud analyzes the past, never current state. Provides recommendations, never executes. All changes must flow through site/line gates. OEE definitions are versioned standards, not code snippets.

        📊
        ERP
        Days / Weeks
        GxP LevelLow
        Time CriticalNo
        CausalityAggregated
        🏭
        MES
        Hours / Shifts
        GxP LevelMedium
        Time CriticalModerate
        CausalityBatch-level
        🖥️
        SCADA
        Seconds / Minutes
        GxP LevelHigh
        Time CriticalYes
        CausalityTime-series
        ⚙️
        PLC
        Milliseconds
        GxP LevelCritical
        Time CriticalEssential
        CausalityReal-time

        💡 Cloud = MES Level

        Cloud typically operates at MES level (hours/days). This means it's suitable for aggregation and analytics, but critical control loops stay at the edge. GxP relevance only emerges when cloud decisions flow back to the line.

        📋 Regulatory Implications

        • 21 CFR Part 11: Applies only at decision points, not at every layer
        • EU Annex 11: System boundary must be explicit – cloud vs edge distinction matters
        • Validation: Risk-based approach scales with proximity

        Time Is Architecture

        GMP demands event time, not file time. Context must exist at decision time, not be reconstructed later. Order and causality are non-negotiable.

        Late-aggregated data cannot explain early decisions.

        GMP is fundamentally a time and order problem.

        ⚙️ Line: Monotonic Time & Causal Ordering

        Sequence is everything. Events must preserve causal order. Timestamps must be monotonic (no NTP jumps backward). Jitter budget: milliseconds. Late events cannot be reinserted into history.

        🏭 Site: Batch Context Stitching

        Context binds immediately. Phase/Recipe/Batch must exist at event time. Reconciliation windows for human events (shift handover, manual entry). Correlation IDs link all events in a batch lifecycle.

        ☁️ Cloud: Event-Time Windows & Audit Storage

        Event time, not processing time. Analytics use event_time for windows, not ingest_time. Late arrivals handled via watermarks. Idempotency required. Audit storage optimized for retrieval, not insertion speed.

        14:35:42.123
        ⚙️ Event Occurs
        Sensor measurement at PLC level. Event time captured at source.
        14:35:42.125
        🎯 Context Added
        Phase, Recipe, Batch context attached immediately, not later.
        14:35:42.234
        📡 Event Published
        Contextualized event sent via message broker. Correlation ID links to process chain.
        14:35:43.012
        💾 Evidence Stored
        Immutable storage. Event time preserved. File time (14:35:43) is irrelevant for GMP.
        15:00:00.000
        📊 Analytics
        Cloud processes event. Uses event time for aggregation, not processing time.

        ⚠️ Anti-Pattern: Reconstructed Context

        Many systems collect raw data and "add context later" in a data warehouse. This is regulatorily worthless. By the time you add context, the decision moment has passed. You're guessing, not observing.

        ⚠️ Anti-Pattern: Logs ≠ Audit Trails

        "Everything logged, nothing correlated" – the most common IT/OT failure pattern. Logs are technical (debugging, isolated, days retention). Audit trails are regulatory (business context, correlated, years retention). Collecting logs does not generate an audit trail.

        ✓ Controls: Time-First Architecture

        Event time as primary timestamp (not system time) · Monotonic timestamps (no NTP time jumps) · Causal ordering (correlation IDs, event chains) · Context at event time, not processing time

        📋 Regulatory Requirements

        • 21 CFR Part 11: Requires complete audit trails with time stamping
        • ALCOA++: Contemporaneous = captured at time of event
        • Validation: Prove that time is not manipulable

        Evidence ≠ Data

        Collecting data doesn't automatically generate evidence. Evidence requires Data + Context + Intent + Traceability aligned with ALCOA++ principles.

        Evidence is Data + Context + Intent + Traceability

        The difference determines if your system is audit-ready or just data-rich.

        ⚙️ Line: Sequence_no Mandatory

        Attributable, Legible, Contemporaneous. Every event needs: producer_id + version, event_time (source!), sequence_no for causality. Original = first write to immutable store. State changes are decisions, not logs.

        🏭 Site: Correlation_id Mandatory

        Complete, Consistent, Enduring. Every event needs: batch_id, recipe_version (immutable), correlation_id, phase. Deviations must be classifiable. Completeness = all batch phases documented.

        ☁️ Cloud: Dataset Fingerprints & Lineage

        Available, Traceable, Accurate. Every aggregation needs: oee_definition_id + version, dataset_fingerprint, lineage to source events. Model registry for AI. Analysis runs must be reproducible.

        📊
        Just Data
        A – Attributable
        Who/what generated it? Unknown.
        C – Contemporaneous
        When exactly? Unclear.
        O – Original
        Copy or original? Unknown.
        Complete
        Process context? Missing.
        Result: Numbers exist, but can't explain decisions
        ⚖️
        Evidence
        A – Attributable
        Service v2.1, Sensor-42, Line-3
        C – Contemporaneous
        Event time: 14:35:42.123
        O – Original
        Append-only, immutable
        Complete
        Phase, Recipe, Batch, Correlation ID
        Result: Can explain every decision

        📋 Minimum Evidence Set

        Every audit-ready event must contain these fields:

        event_id
        Unique identifier (UUID)
        event_type
        Category (measurement, state_change, decision)
        event_time
        When it happened (source time, not system time)
        producer_id
        Service/sensor that generated it + version
        batch_id
        Manufacturing batch or order ID
        recipe_id
        Recipe + version (immutable)
        phase
        Process phase/state at event time
        correlation_id
        Links all events in same process chain
        sequence_no
        Order within process (for causality)
        hash
        Content fingerprint (optional: signature)

        📋 Regulatory Mapping

        • ALCOA++: All 9 principles directly map to these fields
        • 21 CFR Part 11: Audit trail requirement = these fields + immutable storage
        • EU Annex 11: Data integrity = completeness + traceability

        Gates & Decisions

        GMP doesn't apply to systems – it applies to decisions. Gates make responsibility explicit and decisions auditable.

        Only decisions are GxP-critical.

        Data, analytics, AI – none of this matters until someone makes a decision that affects the product.

        ⚙️ Line: Inline Gates & Auto-Rules

        Gates must be low-latency. Rule-based approval for speed (threshold checks). Operator override always available. Decision snapshot captures: who (rule/operator), what (OK/NOK), when (event_time), based_on (inference_result + model_version). Handshakes between PLC and Edge are "mini-gates" – state agreement protocols to avoid silent mismatches.

        🏭 Site: QA Gates & E-Signatures

        Formal approval workflow. Batch close requires QA e-signature. Recipe changes require supervisor approval. Decision snapshot captures: approval_chain, justification, deviation_refs, completeness_check_passed.

        ☁️ Cloud: Change Proposals Only

        Never auto-execute. Cloud generates change.requested events, never change.executed. Site/Line gates review and approve. Decision snapshot captures: analysis_run_id, model_version, confidence, hypothesis, expected_effect.

        🎯 What is a Gate?

        A Gate is not a workflow step. A Gate is the creation of a Decision Snapshot that proves: what was known at decision time. The system cannot proceed until criteria are checked, documented, and approved. Gates make responsibility explicit.

        Gate Process Flow

        Propose
        Gather Evidence
        Snapshot
        Approve
        Execute
        Verify
        📸

        Decision Snapshot Anatomy

        Every gate produces an immutable snapshot proving what was known at the moment of decision. This is where ALCOA++ becomes concrete.

        dataset_fingerprint
        Hash of all data visible at decision time
        → Accurate
        query_definition
        What was queried, which time window, which sources
        → Complete
        source_versions
        Included sources and their exact versions
        → Traceable
        state_before
        System/batch/line state before the decision
        → Original
        proposed_state_after
        What the decision intends to change
        → Legible
        risk_classification
        Severity and approval level required
        → Consistent
        approver_id + role
        Who decided, with what authority
        → Attributable
        snapshot_time
        Exact time the snapshot was frozen
        → Contemporaneous
        🤖
        AI / Analytics
        Recommends
        Calculates, predicts, suggests
        Classifies
        OK/NOK, anomaly detection
        Never Decides
        No authority for GxP-relevant state
        Role: Decision Support
        🎯
        Gate Process
        Reviews
        Checks AI/analytics output
        Snapshots
        Freezes evidence state at decision time
        Decides
        Approved state change with e-signature
        Role: Decision Authority

        📋 Decision Objects – What is Gateable?

        Not every event needs a gate. Only these state changes require formal decision authority:

        Batch Status Change
        Site
        Start, hold, resume, close – every batch transition is a formal decision with QA involvement.
        Product Disposition (OK/NOK)
        Line Site
        Inline auto-rules at line, final disposition at site. Always requires snapshot.
        Recipe Parameter Change
        Site
        Any setpoint or recipe change during production. Cloud may propose, site must approve.
        Model Version Change
        Site Cloud
        AI/ML model updates. Model = Recipe → must be approved, versioned, frozen during execution.
        OEE Definition Change
        Site Cloud
        How OEE is calculated is a versioned standard. Changing it changes what "good" means.
        Limit / Threshold Change
        Site
        Alert thresholds, spec limits. These determine when gates trigger.
        Line Release Decision
        Line Site
        Re-starting a line after a stop. Requires verification that root cause is resolved.
        Deviation Closure
        Site
        Closing a deviation requires evidence that CAPA is effective. Part 11 e-signature.

        🔍 What to Audit: Event vs. Snapshot vs. Change

        Three fundamentally different audit artifacts. Most systems confuse them.

        Aspect Event Snapshot Change
        What Something happened What was known at decision time State was modified
        Example oee.computed: 72% All data visible when gate opened recipe.param.updated
        Part 11? Rarely (only if state_change) Yes – core of the audit trail Always – e-signature required
        Immutable? Yes (append-only) Yes (frozen at creation) Yes (before + after captured)
        Key Fields event_time, producer_id, sequence_no dataset_fingerprint, source_versions, query_definition state_before, state_after, approver_id, e_signature
        Retention Months–Years (depends on type) Years (tied to batch lifecycle) Years (regulatory minimum)

        ⚠️ Common Mistake: AI as Decision Authority

        Many systems let AI "automatically approve" or "auto-execute" when confidence is high. This breaks GMP. AI can recommend with 99% confidence, but the gate must still formally approve. Even if approval is "one-click", the gate makes responsibility explicit.

        🤝 Handshakes Are Mini-Gates

        PLC↔Edge handshakes in production data are a GxP pattern: a state agreement protocol to avoid silent mismatches and make state transitions auditable. They ensure PLC and Edge agree on the same product identity at the same time. Handshakes are inline gates at line level – same principle, lower latency.

        📋 Regulatory Context

        • 21 CFR Part 11: E-signatures required for decisions (meaning of signature must be clear)
        • EU Annex 11: Human oversight for critical decisions, even with automation
        • AI/ML Guidance: Model = Recipe → must be approved, versioned, frozen during execution

        OEE Drop: Signal to Gate to Change

        A concrete walkthrough. OEE drops below threshold on Line 3. Cloud AI detects the anomaly. A gate process starts. A parameter change is proposed, approved, executed, and verified.

        Cloud can trigger, but never decide. Line decides fast. Site decides formally.

        This is the same model everywhere – from simple threshold alerts to AI-driven parameter optimization.

        0
        Pre-Condition
        Baseline Definitions
        Before anything can be "GxP evidence-ready", these baselines must exist as versioned, approved artifacts. They are already evidence-relevant because they determine how decisions are made.
        Versioned OEE Definition v3.1 (formula + state model)
        Versioned Threshold: OEE < 78% → alert, < 65% → gate
        Versioned AI Model v2.4 (approved for recommendation)
        Versioned Line State Model (Run, Stop, Microstop, Setup)
        1
        Line / Edge
        Event & Raw Computation
        OEE is computed at the edge from Run/Stop/Microstop counts. Two event classes emerge: telemetry (data basis, not yet Part 11) and decision candidates (threshold violations, classifications).
        Telemetry cycle_count: 847, runtime_s: 3420, scrap: 12
        Telemetry stop_reason: "material_jam", duration_s: 180
        Decision Candidate oee.computed: 63.2% (threshold: 65%)
        Evidence Min Set event_time + seq_no + producer_id + sw_version + algo_id
        2
        Site
        Gate Initiated
        Threshold violation triggers a gate – logically at the site, not in the cloud. A gate.requested event is created. Evidence gathering begins: not random data loading, but a defined evidence pull against known sources.
        Evidence gate.requested: trigger=oee_below_limit, severity=high
        Evidence Pull last 30min raw events, current recipe v4.2, batch B-2026-0281
        Evidence Pull maintenance log refs, camera inspection summary, operator notes
        3
        Site + Cloud
        Decision Snapshot Created
        This is the core artifact. The snapshot freezes everything that was known at decision time. It is immutable once created. Cloud AI contributes its recommendation as input to the snapshot, but the snapshot belongs to the site.
        Snapshot dataset_fingerprint: sha256:a4f8c...
        Snapshot sources: [edge_events, recipe_store, maint_log, ai_model_v2.4]
        AI Recommendation "Increase feed rate by 8%. Confidence: 87%. Expected OEE: 81%."
        Snapshot state_before: feed_rate=120, state_proposed: feed_rate=130
        4
        Site
        Part 11 Approval
        Part 11 applies here because a recipe parameter changes. E-signature links the approver's identity to the exact snapshot content. The meaning of the signature is explicit.
        Part 11 e_signature: supervisor_id=SUP-042, meaning="approved_change"
        Part 11 reason: "AI recommendation reviewed, root cause plausible"
        Evidence linked_to: snapshot_id=SNAP-2026-04812
        5
        Line
        Execution with Local Authority
        Even though Cloud proposed "set feed_rate to 130", execution happens at the line/site with local authority. The operator confirms via HMI or the site service applies the approved change. Always with a reference to the decision snapshot.
        Change change.executed: feed_rate 120 → 130, by site_service
        Evidence gate_approval_ref: GATE-2026-00412, snapshot_ref: SNAP-2026-04812
        Evidence execution_authority: local (not cloud-direct)
        6
        All Layers
        Verification & Closed Loop
        After execution, verify: did the change have the expected effect? OEE recovery is measured. The gate is formally closed with verification evidence. If the effect was unexpected, a deviation is opened – not silently ignored.
        Evidence oee.computed: 79.4% (expected: 81%, within tolerance)
        Evidence gate.closed: outcome=effective, anomaly=resolved
        Evidence verification_evidence: [post_change_events, oee_trend, no_deviation]

        🔍 In This Scenario: What Was Audited?

        Three distinct artifact types were produced – each with different retention and Part 11 requirements:

        Artifact Type Part 11? Example from Scenario
        Telemetry Events Event No cycle_count, stop_reason, runtime_s
        OEE Computation Event Only if it triggers a gate oee.computed: 63.2% with algorithm version
        Gate Request Evidence Yes gate.requested with trigger, severity, proposal_id
        Decision Snapshot Snapshot Yes – core artifact Frozen dataset with fingerprint, all source versions, AI recommendation
        Parameter Change Change Yes – e-signature feed_rate: 120 → 130 with approval chain
        Verification Evidence Yes gate.closed with outcome, OEE recovery, deviation status

        ⚠️ Anti-Pattern: Cloud Closes the Loop Directly

        "Cloud detects OEE drop → Cloud sets parameter → Done." This is not GMP-compatible. The cloud can send change.proposed, but never change.executed. Execution requires local authority, a gate approval, and a decision snapshot. Without this, there is no audit trail for the parameter change.

        📋 Where Part 11 Applies in This Flow

        • Step 0 (Baselines): Version control, change history – but not e-signature per se
        • Step 1 (Telemetry): Not Part 11. Data basis only.
        • Step 2–3 (Gate + Snapshot): Evidence generation – immutable, traceable
        • Step 4 (Approval): Full Part 11 – e-signature, meaning, identity, linked to snapshot
        • Step 5 (Execution): Part 11 – state change with before/after and authority
        • Step 6 (Verification): Evidence – closed loop documentation