Преглед изворни кода

Make crash recovery validating, phase-tracked, and non-double-spending

A review of the first recovery cut found it could commit envelopes that never
validated and could double-spend: recover() force-completed blindly, skipping
validation and ignoring the affected-row counts from reserve/deactivate. It also
take()-d the plan (breaking finalize retry) and deleted the pending record even
after a mid-finalize failure (losing the roll-forward record).

Rework recovery around a persisted phase and a single verified finalize path.
The write-ahead PendingSaga now carries a phase: Reserving (before reserve) is
bumped to Finalizing at the point of no return, after validation passes and just
before the consumed postings start turning Inactive. recover() branches on it: a
Reserving saga is re-run through the real saga, which re-reserves and
re-validates against current state (aborting cleanly if a posting was taken or an
account frozen); a Finalizing saga is rolled forward through finalize_envelope.

- Add Ledger::finalize_envelope: one idempotent, end-state-verified commit used
  by both the saga's finalize step and recovery. It re-validates while the
  consumed postings are still pre-deactivation (the last-step floor/freeze-close
  guard), then never creates or stores unless ALL consumed postings are confirmed
  Inactive — the double-spend guard. No plan take(), so finalize is retry-safe.
- commit_envelope keeps the pending record on a mid-finalize failure (roll
  forward) and deletes only on commit or a clean pre-finalize abort.
- Collapse the pipeline: validation moves into finalize as its last-step check,
  so the saga is reserve -> finalize; remove ValidateTransferStep and the unused
  ResolveStep.
- Tests cover each crash phase: re-drive Reserving, roll forward a partial
  finalize, abort+release when an account is frozen, and refuse to double-spend a
  taken posting.
- Floor/freeze guards are now tightest-best-effort (re-checked just before the
  writes, on the recovery path too) but not strictly atomic; documented as such.
- Sync all docs, READMEs, module docs, and the ADR to the phase-tracked model.
Cesar Rodas пре 5 дана
родитељ
комит
4f8f528196
10 измењених фајлова са 420 додато и 424 уклоњено
  1. 3 3
      CLAUDE.md
  2. 9 9
      crates/kuatia/README.md
  3. 257 110
      crates/kuatia/src/ledger.rs
  4. 29 197
      crates/kuatia/src/saga.rs
  5. 10 2
      doc/accounts.md
  6. 22 14
      doc/adr/0001-dumb-storage-saga-recovery.md
  7. 61 69
      doc/architecture.md
  8. 14 9
      doc/crates.md
  9. 1 1
      doc/glossary.md
  10. 14 10
      doc/transfers.md

+ 3 - 3
CLAUDE.md

@@ -34,9 +34,9 @@ doc/
 ## Architecture
 
 - **Pure core / async layer separation**: kuatia-core has zero IO, fully deterministic, testable with golden vectors. kuatia adds async Store trait and saga pipeline.
-- **Saga commit pipeline**: every commit is the envelope saga `reserve → validate → finalize`, with automatic retry and LIFO compensation via the `legend` crate. `commit(transfer)` = resolve (read-only) then `commit_envelope`; `reverse()` builds a reversal envelope and runs the same path. There is one commit path, not a separate "atomic" one.
-- **Count interpretation**: the saga reads each primitive's affected-row count — full = continue; partial = error → compensate; zero = read state and continue only if this same envelope/reservation already applied it (idempotency).
-- **Durable recovery**: a write-ahead `PendingSaga {envelope, reservation}` is persisted via `SagaStore` before the saga mutates anything. `Ledger::recover()` (call on startup) force-completes any pending saga through the idempotent primitives — converging from a crash at any point (pre-reserve, reserved, or mid-finalize). Roll-forward, not rollback, so there are no orphaned `PendingInactive` postings to reconcile.
+- **Saga commit pipeline**: every commit is the **two-step** envelope saga `reserve → finalize` (validation runs inside the finalize step, as the last thing before the writes), with automatic retry and LIFO compensation via the `legend` crate. `commit(transfer)` = resolve (read-only) then `commit_envelope`; `reverse()` builds a reversal envelope and runs the same path. There is one commit path, not a separate "atomic" one.
+- **Count interpretation**: the saga reads each primitive's affected-row count — full = continue; partial = error → compensate; zero = read state and continue only if this same envelope/reservation already applied it (idempotency). `finalize_envelope` additionally verifies every end-state (all consumed postings `Inactive`, created exist, transfer stored).
+- **Durable recovery**: a phase-tracked write-ahead `PendingSaga {envelope, reservation, phase}` is persisted via `SagaStore` before the saga mutates anything (`Reserving`), bumped to `Finalizing` once validation passed and the consumed postings are about to turn `Inactive`. `Ledger::recover()` (call on startup) branches on phase: a `Reserving` saga is **re-run and re-validated** (aborting cleanly if a posting was taken or an account frozen); a `Finalizing` saga is rolled forward through the verified `finalize_envelope`. Roll-forward, not rollback, so there are no orphaned `PendingInactive` postings to reconcile.
 - **Content-addressed transfers**: EnvelopeId = double-SHA-256 of canonical bytes. Provides idempotency and tamper evidence.
 - **Append-only accounts**: versioned, never modified in place. Snapshot pinning (validate-time) prevents TOCTOU races; under the dumb-storage model the overdraft-floor and freeze/close guards are validate-time and best-effort under concurrency.
 - **Store uses `Arc<dyn Store>`**: Ledger is non-generic, enabling concrete saga types.

+ 9 - 9
crates/kuatia/README.md

@@ -27,16 +27,15 @@ let receipt = ledger.commit(transfer).await?;
 
 ### Commit
 
-Every commit is the **envelope saga** (reserve → validate → finalize), driven by
-`legend` with automatic retry and LIFO compensation:
+Every commit is the **envelope saga** — two steps driven by `legend` with
+automatic retry and LIFO compensation:
 
 - `commit(transfer)` — resolves the intent into a concrete envelope (read-only),
   then runs `commit_envelope`.
 - `commit_envelope(envelope)` — the one commit path. Persists a write-ahead
-  `PendingSaga` record, then:
+  `PendingSaga` record (phase `Reserving`), then:
   1. **Reserve** — `reserve_postings`: Active → PendingInactive, stamped with this saga's `ReservationId`
-  2. **Validate** — pure `validate_and_plan()`
-  3. **Finalize** — the dumb, idempotent primitives in sequence: `deactivate_postings` → `insert_postings` → `store_transfer` → `append_event`
+  2. **Finalize** — re-validates against current state (the last-step floor / freeze-close guard), marks the saga `Finalizing`, then runs the dumb primitives `deactivate_postings` → `insert_postings` → `store_transfer` → `append_event`, verifying every end-state
 - `reverse(id)` — builds a reversal envelope and runs the same path.
 
 The store reports an **affected-row count** for each primitive; the saga
@@ -46,10 +45,11 @@ monolithic `commit_transfer` and no separate "atomic" path.
 
 ### Crash recovery
 
-`recover()` — call on startup. It force-completes any `PendingSaga` left by a
-crash, pushing the envelope through the idempotent primitives so a commit
-interrupted at any point (pre-reserve, reserved, or mid-finalize) converges to
-the committed state. Roll-forward, not rollback.
+`recover()` — call on startup. It completes any `PendingSaga` left by a crash,
+branching on the persisted phase: a `Reserving` saga is re-run (re-validating,
+aborting cleanly if a posting was taken or an account frozen); a `Finalizing`
+saga is rolled forward through the verified `finalize_envelope`. Roll-forward,
+not rollback.
 
 ### Account lifecycle
 

+ 257 - 110
crates/kuatia/src/ledger.rs

@@ -23,7 +23,6 @@ pub(crate) fn now_millis() -> Result<i64, LedgerError> {
 }
 use crate::saga::{
     FinalizeInput, FinalizeTransferStep, LedgerCtx, ReserveInput, ReservePostingsStep, SagaError,
-    ValidateInput, ValidateTransferStep,
 };
 use kuatia_storage::error::StoreError;
 use kuatia_storage::events::{LedgerEvent, LedgerEventKind};
@@ -35,20 +34,33 @@ mod envelope_saga {
     legend! {
         EnvelopeSaga<LedgerCtx, SagaError> {
             reserve: ReservePostingsStep,
-            validate: ValidateTransferStep,
             finalize: FinalizeTransferStep,
         }
     }
 }
 use envelope_saga::*;
 
+/// Phase of an in-flight commit, persisted with the write-ahead record so
+/// recovery knows whether validation has completed.
+#[derive(Clone, Copy, PartialEq, Eq, Debug, serde::Serialize, serde::Deserialize)]
+enum SagaPhase {
+    /// Saved before reserve. Validation has not necessarily run, so recovery must
+    /// re-reserve and re-validate before it can commit.
+    Reserving,
+    /// Saved at the start of finalize — after validation passed and just before
+    /// the consumed postings begin turning `Inactive` (the point of no return).
+    /// Recovery rolls forward without re-validating.
+    Finalizing,
+}
+
 /// Write-ahead record for an in-flight commit, persisted via `SagaStore` before
 /// the saga mutates anything and removed once it reaches a terminal state. On
-/// startup [`Ledger::recover`] re-drives any that survive a crash.
+/// startup [`Ledger::recover`] completes any that survive a crash.
 #[derive(serde::Serialize, serde::Deserialize)]
 struct PendingSaga {
     envelope: Envelope,
     reservation: kuatia_core::ReservationId,
+    phase: SagaPhase,
 }
 
 /// Async ledger resource composing the commit pipeline.
@@ -250,7 +262,7 @@ impl Ledger {
     }
 
     // -----------------------------------------------------------------------
-    // Commit: every commit is the envelope saga (reserve -> validate -> finalize)
+    // Commit: every commit is the envelope saga (reserve -> finalize; finalize re-validates)
     // -----------------------------------------------------------------------
 
     /// Commit a [`Transfer`] intent. Resolves it into a concrete envelope, then
@@ -288,25 +300,32 @@ impl Ledger {
             return Ok(record.receipt);
         }
 
-        // Write-ahead: persist {envelope, reservation} so recovery can re-drive.
+        // Write-ahead: persist {envelope, reservation, phase=Reserving} before any
+        // mutation. The finalize step bumps the phase to Finalizing.
         let reservation = kuatia_core::ReservationId::default();
         let saga_id = reservation.0;
-        let blob = serde_json::to_vec(&PendingSaga {
-            envelope: envelope.clone(),
-            reservation,
-        })
-        .map_err(|e| LedgerError::Store(StoreError::Internal(e.to_string())))?;
-        self.store.save_saga(&saga_id, blob).await?;
+        self.save_pending(&envelope, reservation, SagaPhase::Reserving)
+            .await?;
 
         let result = self.drive_envelope_saga(envelope, reservation).await;
 
-        // Terminal: drop the pending record whether we committed or compensated.
-        self.store.delete_saga(&saga_id).await?;
+        // Delete the pending record only when it is safe: on success, or on a
+        // failure that never reached finalize (phase still Reserving → the saga's
+        // compensation released our reservation, nothing of ours was applied). If
+        // finalize started (Finalizing) and failed, keep it so `recover()` rolls
+        // the half-applied commit forward.
+        let safe_to_delete = match &result {
+            Ok(_) => true,
+            Err(_) => self.read_pending_phase(saga_id).await? != Some(SagaPhase::Finalizing),
+        };
+        if safe_to_delete {
+            self.store.delete_saga(&saga_id).await?;
+        }
         result
     }
 
-    /// Build and run the envelope saga to a terminal outcome, returning the
-    /// resulting receipt.
+    /// Build and run the envelope saga (reserve → finalize) to a terminal
+    /// outcome, returning the resulting receipt.
     async fn drive_envelope_saga(
         self: &Arc<Self>,
         envelope: Envelope,
@@ -314,7 +333,6 @@ impl Ledger {
     ) -> Result<Receipt, LedgerError> {
         let saga = EnvelopeSaga::new(EnvelopeSagaInputs {
             reserve: ReserveInput,
-            validate: ValidateInput,
             finalize: FinalizeInput,
         });
         let ctx = LedgerCtx::for_envelope(Arc::clone(self), envelope, reservation);
@@ -350,14 +368,16 @@ impl Ledger {
         }
     }
 
-    /// Re-drive every pending saga to completion. Call on startup to recover
-    /// commits interrupted by a crash, returning how many were processed.
+    /// Complete every pending saga left by a crash. Call on startup; returns how
+    /// many were processed.
     ///
-    /// Recovery does **not** re-run reserve/validate — those reject already-
-    /// consumed postings, and the envelope was already validated when first
-    /// committed. Instead it force-completes the envelope through the idempotent
-    /// primitives with the original reservation, so a crash at any point
-    /// (pre-reserve, reserved, or mid-finalize) converges to the committed state.
+    /// Recovery branches on the persisted phase. A `Reserving` saga had not
+    /// necessarily validated, so it is re-run through the real saga (which
+    /// re-reserves and **re-validates** — aborting cleanly if the postings were
+    /// taken or an account was frozen meanwhile). A `Finalizing` saga had already
+    /// validated and owns its postings, so it is rolled forward through the
+    /// verified `finalize_envelope`. Either way the record is removed only once
+    /// the work is committed or safely abandoned.
     #[instrument(skip(self), name = "ledger.recover")]
     pub async fn recover(self: &Arc<Self>) -> Result<usize, LedgerError> {
         let pending = self.store.list_pending_sagas().await?;
@@ -366,35 +386,101 @@ impl Ledger {
             let PendingSaga {
                 envelope,
                 reservation,
+                phase,
             } = serde_json::from_slice(&blob)
                 .map_err(|e| LedgerError::Store(StoreError::Internal(e.to_string())))?;
-            self.complete_envelope(&envelope, reservation).await?;
-            self.store.delete_saga(&saga_id).await?;
+
+            // Already committed (crashed after store_transfer) → nothing to do.
+            if self.store.get_transfer(&envelope_id(&envelope)).await?.is_some() {
+                self.store.delete_saga(&saga_id).await?;
+                continue;
+            }
+
+            match phase {
+                SagaPhase::Finalizing => {
+                    // Validation passed and the postings are ours; roll forward.
+                    // Keep the record if completion fails so a later run retries.
+                    if self.finalize_envelope(&envelope, reservation).await.is_ok() {
+                        self.store.delete_saga(&saga_id).await?;
+                    }
+                }
+                SagaPhase::Reserving => {
+                    // Re-run the validating saga. On failure, delete only if it did
+                    // not reach finalize (clean abort); otherwise keep for next run.
+                    let result = self.drive_envelope_saga(envelope, reservation).await;
+                    let safe_to_delete = result.is_ok()
+                        || self.read_pending_phase(saga_id).await?
+                            != Some(SagaPhase::Finalizing);
+                    if safe_to_delete {
+                        self.store.delete_saga(&saga_id).await?;
+                    }
+                }
+            }
         }
         Ok(count)
     }
 
-    /// Idempotently push `envelope`'s postings and record to their committed
-    /// state. Safe to call from any partial point: each primitive no-ops what is
-    /// already done. Used by [`recover`].
-    async fn complete_envelope(
+    /// Idempotently finalize `envelope` to its committed state, **verifying every
+    /// step's end-state**. Used by the saga's finalize step and by recovery.
+    ///
+    /// When the consumed postings are still pre-deactivation it re-validates
+    /// against current state (the last-step floor / freeze-close guard) and then
+    /// marks the saga `Finalizing` (the point of no return). Once any consumed
+    /// posting is already `Inactive` — a prior attempt or recovery passed that
+    /// point — it rolls forward without re-validating (validation rejects
+    /// `Inactive`). It never creates or stores anything unless **all** consumed
+    /// postings are confirmed `Inactive`, which is the double-spend guard.
+    pub(crate) async fn finalize_envelope(
         &self,
         envelope: &Envelope,
         reservation: kuatia_core::ReservationId,
-    ) -> Result<(), LedgerError> {
+    ) -> Result<Receipt, LedgerError> {
         let tid = envelope_id(envelope);
-        if self.store.get_transfer(&tid).await?.is_some() {
-            return Ok(()); // already committed
+        if let Some(record) = self.store.get_transfer(&tid).await? {
+            return Ok(record.receipt); // already committed
         }
-
         let consumes = envelope.consumes();
-        // Reserve then deactivate: this drives Active → PendingInactive → Inactive,
-        // and each call no-ops anything already past that state.
-        self.store.reserve_postings(consumes, reservation).await?;
+
+        // Read consumed postings (also captures their owners for indexing).
+        let consumed = if consumes.is_empty() {
+            Vec::new()
+        } else {
+            self.store.get_postings(consumes).await?
+        };
+        let past_no_return = consumed
+            .iter()
+            .any(|p| p.status == PostingStatus::Inactive);
+
+        // Last-step boundary re-check: re-validate floor + freeze/close + snapshots
+        // against current state, but only while it is still safe (validation
+        // rejects already-`Inactive` consumed postings).
+        if !past_no_return {
+            let loaded = self.load(envelope).await?;
+            self.plan(envelope, &loaded)?;
+        }
+
+        // Point of no return: record Finalizing before any posting turns Inactive.
+        self.save_pending(envelope, reservation, SagaPhase::Finalizing)
+            .await?;
+
+        // Deactivate consumed postings (PendingInactive owned by us → Inactive),
+        // then assert ALL consumed postings are Inactive. This is the double-spend
+        // guard: do not create/store unless the inputs were really consumed by us.
         self.store
             .deactivate_postings(consumes, Some(reservation))
             .await?;
+        if !consumes.is_empty() {
+            let after = self.store.get_postings(consumes).await?;
+            if after.len() != consumes.len()
+                || after.iter().any(|p| p.status != PostingStatus::Inactive)
+            {
+                return Err(LedgerError::Store(StoreError::Internal(
+                    "finalize: consumed postings not all inactive (contended or not reserved by this saga)".into(),
+                )));
+            }
+        }
 
+        // Created postings, derived deterministically from the envelope.
         let created: Vec<Posting> = envelope
             .creates()
             .iter()
@@ -412,25 +498,38 @@ impl Ledger {
             })
             .collect();
         self.store.insert_postings(&created).await?;
+        if !created.is_empty() {
+            let ids: Vec<PostingId> = created.iter().map(|p| p.id).collect();
+            if self.store.get_postings(&ids).await?.len() != created.len() {
+                return Err(LedgerError::Store(StoreError::Internal(
+                    "finalize: created postings missing after insert".into(),
+                )));
+            }
+        }
 
+        // Index both created and consumed owners.
         let mut involved: Vec<AccountId> = created.iter().map(|p| p.owner).collect();
-        if !consumes.is_empty() {
-            let consumed = self.store.get_postings(consumes).await?;
-            involved.extend(consumed.iter().map(|p| p.owner));
-        }
+        involved.extend(consumed.iter().map(|p| p.owner));
         involved.sort();
         involved.dedup();
 
+        let receipt = Receipt { transfer_id: tid };
         self.store
             .store_transfer(
                 EnvelopeRecord {
                     envelope: envelope.clone(),
-                    receipt: Receipt { transfer_id: tid },
+                    receipt: receipt.clone(),
                     created_at: now_millis()?,
                 },
                 &involved,
             )
             .await?;
+        if self.store.get_transfer(&tid).await?.is_none() {
+            return Err(LedgerError::Store(StoreError::Internal(
+                "finalize: transfer record missing after store".into(),
+            )));
+        }
+
         self.store
             .append_event(&LedgerEvent {
                 seq: 0,
@@ -438,9 +537,38 @@ impl Ledger {
                 kind: LedgerEventKind::TransferCommitted { transfer_id: tid },
             })
             .await?;
+        Ok(receipt)
+    }
+
+    /// Persist the write-ahead pending-saga record (upsert on the reservation id).
+    async fn save_pending(
+        &self,
+        envelope: &Envelope,
+        reservation: kuatia_core::ReservationId,
+        phase: SagaPhase,
+    ) -> Result<(), LedgerError> {
+        let blob = serde_json::to_vec(&PendingSaga {
+            envelope: envelope.clone(),
+            reservation,
+            phase,
+        })
+        .map_err(|e| LedgerError::Store(StoreError::Internal(e.to_string())))?;
+        self.store.save_saga(&reservation.0, blob).await?;
         Ok(())
     }
 
+    /// Read the persisted phase of a pending saga, if it still exists.
+    async fn read_pending_phase(&self, saga_id: i64) -> Result<Option<SagaPhase>, LedgerError> {
+        for (id, blob) in self.store.list_pending_sagas().await? {
+            if id == saga_id {
+                let pending: PendingSaga = serde_json::from_slice(&blob)
+                    .map_err(|e| LedgerError::Store(StoreError::Internal(e.to_string())))?;
+                return Ok(Some(pending.phase));
+            }
+        }
+        Ok(None)
+    }
+
     // -----------------------------------------------------------------------
     // Reverse
     // -----------------------------------------------------------------------
@@ -740,7 +868,7 @@ pub struct LoadedState {
 #[cfg(test)]
 mod recovery_tests {
     use super::*;
-    use kuatia_core::{Account, AccountFlags, TransferBuilder, UserData};
+    use kuatia_core::{Account, AccountFlags, ReservationId, TransferBuilder, UserData};
     use kuatia_storage::mem_store::InMemoryStore;
     use std::collections::BTreeMap;
 
@@ -756,106 +884,125 @@ mod recovery_tests {
         }
     }
 
-    /// A commit interrupted right after its write-ahead record (before any step)
-    /// is completed by `recover()`: the postings move and the record is cleared.
-    #[tokio::test]
-    async fn recover_redrives_pending_saga() {
+    async fn funded_ledger() -> Arc<Ledger> {
         let ledger = Arc::new(Ledger::new(InMemoryStore::new()));
         for (id, p) in [
             (1, AccountPolicy::NoOverdraft),
             (2, AccountPolicy::NoOverdraft),
+            (3, AccountPolicy::NoOverdraft),
             (99, AccountPolicy::ExternalAccount),
         ] {
             ledger.store().create_account(acct(id, p)).await.unwrap();
         }
-        // Fund account 1.
         let deposit = TransferBuilder::new()
             .deposit(AccountId::new(1), AssetId::new(1), Cent::from(100), AccountId::new(99))
             .unwrap()
             .build();
         ledger.commit(deposit).await.unwrap();
+        ledger
+    }
 
-        // Resolve a pay envelope but persist it as a pending saga WITHOUT running
-        // it — simulating a crash right after the write-ahead record.
-        let pay = TransferBuilder::new()
+    fn pay_transfer() -> Transfer {
+        TransferBuilder::new()
             .pay(AccountId::new(1), AccountId::new(2), AssetId::new(1), Cent::from(40))
-            .build();
-        let envelope = ledger.resolve(&pay).await.unwrap();
-        let reservation = kuatia_core::ReservationId::default();
+            .build()
+    }
+
+    async fn save_pending(ledger: &Arc<Ledger>, envelope: &Envelope, rid: ReservationId, phase: SagaPhase) {
         let blob = serde_json::to_vec(&PendingSaga {
-            envelope,
-            reservation,
+            envelope: envelope.clone(),
+            reservation: rid,
+            phase,
         })
         .unwrap();
-        ledger.store().save_saga(&reservation.0, blob).await.unwrap();
+        ledger.store().save_saga(&rid.0, blob).await.unwrap();
+    }
+
+    /// A commit interrupted right after its write-ahead record (phase Reserving,
+    /// before any step) is re-run and completed by `recover()`.
+    #[tokio::test]
+    async fn recover_redrives_reserving_saga() {
+        let ledger = funded_ledger().await;
+        let envelope = ledger.resolve(&pay_transfer()).await.unwrap();
+        let rid = ReservationId::default();
+        save_pending(&ledger, &envelope, rid, SagaPhase::Reserving).await;
 
-        // Recover re-drives it to completion.
         assert_eq!(ledger.recover().await.unwrap(), 1);
-        assert_eq!(
-            ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(),
-            Cent::from(40)
-        );
-        assert_eq!(
-            ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(),
-            Cent::from(60)
-        );
+        assert_eq!(ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(), Cent::from(40));
+        assert_eq!(ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(), Cent::from(60));
         assert!(ledger.store().list_pending_sagas().await.unwrap().is_empty());
     }
 
-    /// A commit that crashed **mid-finalize** — consumed posting already flipped
-    /// to Inactive but the transfer record not yet written — is still completed by
-    /// `recover()` (reserve/validate are skipped; the primitives no-op the done work).
+    /// A commit that crashed mid-finalize (phase Finalizing; the consumed posting
+    /// is already Inactive) is rolled forward by `recover()`.
     #[tokio::test]
     async fn recover_completes_partial_finalize() {
-        let ledger = Arc::new(Ledger::new(InMemoryStore::new()));
-        for (id, p) in [
-            (1, AccountPolicy::NoOverdraft),
-            (2, AccountPolicy::NoOverdraft),
-            (99, AccountPolicy::ExternalAccount),
-        ] {
-            ledger.store().create_account(acct(id, p)).await.unwrap();
-        }
-        let deposit = TransferBuilder::new()
-            .deposit(AccountId::new(1), AssetId::new(1), Cent::from(100), AccountId::new(99))
-            .unwrap()
-            .build();
-        ledger.commit(deposit).await.unwrap();
-
-        // Resolve a pay envelope and manually run the commit halfway: reserve the
-        // consumed posting and deactivate it (now Inactive), then "crash" — the
-        // transfer record and created postings were never written.
-        let pay = TransferBuilder::new()
-            .pay(AccountId::new(1), AccountId::new(2), AssetId::new(1), Cent::from(40))
-            .build();
-        let envelope = ledger.resolve(&pay).await.unwrap();
-        let reservation = kuatia_core::ReservationId::default();
+        let ledger = funded_ledger().await;
+        let envelope = ledger.resolve(&pay_transfer()).await.unwrap();
+        let rid = ReservationId::default();
+        // Run the commit halfway: reserve + deactivate the consumed posting.
         let consumes = envelope.consumes().to_vec();
-        ledger.store().reserve_postings(&consumes, reservation).await.unwrap();
-        let n = ledger
+        ledger.store().reserve_postings(&consumes, rid).await.unwrap();
+        assert_eq!(ledger.store().deactivate_postings(&consumes, Some(rid)).await.unwrap(), 1);
+        save_pending(&ledger, &envelope, rid, SagaPhase::Finalizing).await;
+
+        assert_eq!(ledger.recover().await.unwrap(), 1);
+        assert_eq!(ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(), Cent::from(40));
+        assert_eq!(ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(), Cent::from(60));
+        assert!(ledger.store().list_pending_sagas().await.unwrap().is_empty());
+    }
+
+    /// Recovery of a `Reserving` saga re-validates against current state: if an
+    /// account was frozen after the write-ahead record, the commit is abandoned —
+    /// no postings move, the reservation is released, and the record is cleared.
+    #[tokio::test]
+    async fn recover_revalidates_and_aborts_when_account_frozen() {
+        let ledger = funded_ledger().await;
+        let envelope = ledger.resolve(&pay_transfer()).await.unwrap();
+        let tid = envelope_id(&envelope);
+        let rid = ReservationId::default();
+        save_pending(&ledger, &envelope, rid, SagaPhase::Reserving).await;
+
+        // A freeze lands before recovery runs.
+        ledger.freeze(&AccountId::new(1)).await.unwrap();
+
+        assert_eq!(ledger.recover().await.unwrap(), 1);
+        // Nothing committed; balances unchanged; reservation released.
+        assert!(ledger.store().get_transfer(&tid).await.unwrap().is_none());
+        assert_eq!(ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(), Cent::from(100));
+        assert_eq!(ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(), Cent::ZERO);
+        let active = ledger
             .store()
-            .deactivate_postings(&consumes, Some(reservation))
+            .get_postings_by_account(&AccountId::new(1), Some(&AssetId::new(1)), Some(PostingStatus::Active))
             .await
             .unwrap();
-        assert_eq!(n, 1); // consumed posting is now Inactive
+        assert_eq!(active.len(), 1); // back to Active
+        assert!(ledger.store().list_pending_sagas().await.unwrap().is_empty());
+    }
 
-        let blob = serde_json::to_vec(&PendingSaga {
-            envelope,
-            reservation,
-        })
-        .unwrap();
-        ledger.store().save_saga(&reservation.0, blob).await.unwrap();
+    /// Recovery cannot double-spend: if the consumed posting was taken by another
+    /// transfer while the saga was pending, recovery aborts without creating or
+    /// storing anything.
+    #[tokio::test]
+    async fn recover_does_not_double_spend_a_taken_posting() {
+        let ledger = funded_ledger().await;
+        let envelope = ledger.resolve(&pay_transfer()).await.unwrap();
+        let tid = envelope_id(&envelope);
+        let rid = ReservationId::default();
+        save_pending(&ledger, &envelope, rid, SagaPhase::Reserving).await;
+
+        // Another transfer consumes account 1's posting and commits.
+        let steal = TransferBuilder::new()
+            .pay(AccountId::new(1), AccountId::new(3), AssetId::new(1), Cent::from(50))
+            .build();
+        ledger.commit(steal).await.unwrap();
 
-        // Recovery finishes the commit despite reserve/validate being unable to
-        // re-run over the already-consumed posting.
         assert_eq!(ledger.recover().await.unwrap(), 1);
-        assert_eq!(
-            ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(),
-            Cent::from(40)
-        );
-        assert_eq!(
-            ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(),
-            Cent::from(60)
-        );
+        // Our envelope never committed; only the stealing transfer applied.
+        assert!(ledger.store().get_transfer(&tid).await.unwrap().is_none());
+        assert_eq!(ledger.balance(&AccountId::new(1), &AssetId::new(1)).await.unwrap(), Cent::from(50));
+        assert_eq!(ledger.balance(&AccountId::new(3), &AssetId::new(1)).await.unwrap(), Cent::from(50));
+        assert_eq!(ledger.balance(&AccountId::new(2), &AssetId::new(1)).await.unwrap(), Cent::ZERO);
         assert!(ledger.store().list_pending_sagas().await.unwrap().is_empty());
     }
 }

+ 29 - 197
crates/kuatia/src/saga.rs

@@ -6,19 +6,17 @@
 //!
 //! # Envelope pipeline saga
 //!
-//! A commit is three saga steps over a pre-resolved [`Envelope`] (resolution
-//! runs before the saga, in `Ledger::commit`):
+//! A commit is two saga steps over a pre-resolved [`Envelope`] (resolution runs
+//! before the saga, in `Ledger::commit`):
 //!
-//! 1. **ReservePostingsStep** -- `reserve_postings`: Active → PendingInactive, stamped with the saga's `ReservationId`
-//! 2. **ValidateTransferStep** -- load accounts/balances, run `validate_and_plan()`
-//! 3. **FinalizeTransferStep** -- the dumb primitives in sequence: `deactivate_postings` → `insert_postings` → `store_transfer` → `append_event`
+//! 1. **ReservePostingsStep** -- `reserve_postings`: Active → PendingInactive, stamped with the saga's `ReservationId`; interprets the count via [`verify_postings`].
+//! 2. **FinalizeTransferStep** -- delegates to `Ledger::finalize_envelope`, which re-validates against current state (the last-step floor / freeze-close guard), marks the saga `Finalizing`, then runs the dumb primitives (`deactivate_postings` → `insert_postings` → `store_transfer` → `append_event`) verifying every end-state.
 //!
-//! Each step issues dumb storage instructions and **interprets the affected-row
-//! count** itself (full = continue; partial = error → compensate; zero = read
-//! state and continue only if this same envelope/reservation already applied it).
-//! See [`verify_postings`]. The `EnvelopeSaga` is defined via `legend!` in
-//! `ledger.rs` and driven by `commit_envelope()`; crash recovery re-completes a
-//! persisted saga via `Ledger::recover`.
+//! The `EnvelopeSaga` is defined via `legend!` in `ledger.rs` and driven by
+//! `commit_envelope()`. Crash recovery (`Ledger::recover`) re-completes a
+//! persisted saga using its persisted phase: a `Reserving` saga is re-run
+//! (re-validating); a `Finalizing` saga is rolled forward through the same
+//! verified `finalize_envelope`.
 //!
 //! # High-level composition
 //!
@@ -33,14 +31,13 @@ use serde::{Deserialize, Serialize};
 use tracing::Instrument;
 
 use kuatia_core::{
-    AccountId, AssetId, Cent, Envelope, Plan, PlanInput, Posting, PostingId, PostingStatus, Receipt,
-    ReservationId, Transfer, TransferBuilder, validate_and_plan,
+    AccountId, AssetId, Cent, Envelope, Posting, PostingId, PostingStatus, Receipt, ReservationId,
+    TransferBuilder,
 };
 
 use crate::error::LedgerError;
-use crate::ledger::{Ledger, now_millis};
-use kuatia_storage::events::{LedgerEvent, LedgerEventKind};
-use kuatia_storage::store::{EnvelopeRecord, Store};
+use crate::ledger::Ledger;
+use kuatia_storage::store::Store;
 
 /// Interpret a dumb primitive's affected-row `count` against the `ids` it
 /// targeted. `count == ids.len()` is success. A short count is acceptable only if
@@ -115,8 +112,6 @@ pub struct LedgerCtx {
     pub receipts: Vec<Receipt>,
     /// Posting ids reserved so far (for compensation).
     pub reserved_postings: Vec<PostingId>,
-    /// Validated plan produced by the validate step.
-    pub plan: Option<Plan>,
     /// Resolved envelope produced by the resolve step.
     pub envelope: Option<Envelope>,
     /// Reservation owner token for this saga's reserved postings. Serialized so
@@ -131,7 +126,6 @@ impl std::fmt::Debug for LedgerCtx {
         f.debug_struct("LedgerCtx")
             .field("receipts", &self.receipts)
             .field("reserved_postings", &self.reserved_postings.len())
-            .field("has_plan", &self.plan.is_some())
             .field("has_envelope", &self.envelope.is_some())
             .field("ledger_present", &self.ledger.is_some())
             .finish()
@@ -144,14 +138,13 @@ impl LedgerCtx {
         Self {
             receipts: Vec::new(),
             reserved_postings: Vec::new(),
-            plan: None,
             envelope: None,
             reservation: ReservationId::default(),
             ledger: Some(ledger),
         }
     }
 
-    /// Create a context for the envelope pipeline (reserve → validate → finalize)
+    /// Create a context for the envelope pipeline (reserve → finalize; finalize re-validates)
     /// with a pre-resolved envelope and an explicit reservation.
     pub fn for_envelope(
         ledger: Arc<Ledger>,
@@ -161,7 +154,6 @@ impl LedgerCtx {
         Self {
             receipts: Vec::new(),
             reserved_postings: Vec::new(),
-            plan: None,
             envelope: Some(envelope),
             reservation,
             ledger: Some(ledger),
@@ -189,54 +181,11 @@ impl LedgerCtx {
 }
 
 // ===========================================================================
-// Transfer pipeline steps (resolve -> reserve -> validate -> finalize)
+// Envelope pipeline steps (reserve -> finalize; resolve runs before the saga, validate inside finalize)
 // ===========================================================================
 
 // ---------------------------------------------------------------------------
-// Step 1: ResolveStep
-// ---------------------------------------------------------------------------
-
-/// Input for the resolve step: the transfer intent to resolve.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ResolveInput {
-    /// The transfer intent to resolve into a concrete envelope.
-    pub transfer: Transfer,
-}
-
-/// Resolves a [`Transfer`] intent into a concrete [`Envelope`] by selecting
-/// postings for each movement.
-///
-/// Compensation is a no-op (no side effects).
-pub struct ResolveStep;
-
-#[async_trait]
-impl Step<LedgerCtx, SagaError> for ResolveStep {
-    type Input = ResolveInput;
-
-    async fn execute(ctx: &mut LedgerCtx, input: &ResolveInput) -> Result<StepOutcome, SagaError> {
-        async {
-            let ledger = ctx.ledger()?;
-            let envelope = ledger
-                .resolve(&input.transfer)
-                .await
-                .map_err(SagaError::from)?;
-            ctx.envelope = Some(envelope);
-            Ok(StepOutcome::Continue)
-        }
-        .instrument(tracing::info_span!("saga_step", step = "resolve"))
-        .await
-    }
-
-    async fn compensate(
-        _ctx: &mut LedgerCtx,
-        _input: &ResolveInput,
-    ) -> Result<CompensationOutcome, SagaError> {
-        Ok(CompensationOutcome::Completed)
-    }
-}
-
-// ---------------------------------------------------------------------------
-// Step 2: ReservePostingsStep
+// Step 1: ReservePostingsStep
 // ---------------------------------------------------------------------------
 
 /// Input for the reserve step (posting ids come from ctx.envelope).
@@ -307,72 +256,17 @@ impl Step<LedgerCtx, SagaError> for ReservePostingsStep {
 }
 
 // ---------------------------------------------------------------------------
-// Step 3: ValidateTransferStep
+// Step 2: FinalizeTransferStep
 // ---------------------------------------------------------------------------
 
-/// Input for the validate step (envelope comes from ctx).
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ValidateInput;
-
-/// Loads accounts and balances, then runs `validate_and_plan()`.
-///
-/// Stores the resulting [`Plan`] in the context for the finalize step.
-/// Compensation is a no-op (reads only).
-pub struct ValidateTransferStep;
-
-#[async_trait]
-impl Step<LedgerCtx, SagaError> for ValidateTransferStep {
-    type Input = ValidateInput;
-
-    async fn execute(
-        ctx: &mut LedgerCtx,
-        _input: &ValidateInput,
-    ) -> Result<StepOutcome, SagaError> {
-        async {
-            let envelope = ctx.envelope.as_ref().ok_or(SagaError {
-                message: "no envelope in context -- resolve step must run first".into(),
-            })?;
-
-            let ledger = ctx.ledger()?;
-            let loaded = ledger.load(envelope).await.map_err(SagaError::from)?;
-
-            let plan_input = PlanInput {
-                envelope,
-                consumed_postings: &loaded.consumed_postings,
-                accounts: &loaded.accounts,
-                balances: &loaded.balances,
-                book: loaded.book.as_ref(),
-            };
-
-            let plan =
-                validate_and_plan(plan_input).map_err(|e| SagaError::from(LedgerError::from(e)))?;
-            ctx.plan = Some(plan);
-            Ok(StepOutcome::Continue)
-        }
-        .instrument(tracing::info_span!("saga_step", step = "validate"))
-        .await
-    }
-
-    async fn compensate(
-        _ctx: &mut LedgerCtx,
-        _input: &ValidateInput,
-    ) -> Result<CompensationOutcome, SagaError> {
-        Ok(CompensationOutcome::Completed)
-    }
-}
-
-// ---------------------------------------------------------------------------
-// Step 4: FinalizeTransferStep
-// ---------------------------------------------------------------------------
-
-/// Input for the finalize step (envelope and plan come from ctx).
+/// Input for the finalize step (envelope comes from ctx).
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct FinalizeInput;
 
-/// Finalizes the envelope: PendingInactive to Inactive, creates new postings,
-/// stores the envelope record.
+/// Re-validates against current state (the last-step floor / freeze-close guard),
+/// then drives the verified, idempotent commit via [`Ledger::finalize_envelope`].
 ///
-/// Compensation reverses the finalized envelope.
+/// Compensation reverses the finalized envelope (only relevant once committed).
 pub struct FinalizeTransferStep;
 
 #[async_trait]
@@ -384,81 +278,19 @@ impl Step<LedgerCtx, SagaError> for FinalizeTransferStep {
         _input: &FinalizeInput,
     ) -> Result<StepOutcome, SagaError> {
         async {
-            let plan = ctx.plan.take().ok_or(SagaError {
-                message: "no plan in context -- validate step must run first".into(),
+            let envelope = ctx.envelope.clone().ok_or(SagaError {
+                message: "no envelope in context -- resolve step must run first".into(),
             })?;
             let rid = ctx.reservation;
             let ledger = ctx.ledger_arc()?;
-            let store = ledger.store();
-            let receipt = Receipt {
-                transfer_id: plan.transfer_id,
-            };
 
-            // Commit is a sequence of dumb, idempotent primitives. Each is its own
-            // atomic update; the saga sequences them and a crash mid-sequence is
-            // completed by idempotent roll-forward in recovery.
-
-            // 1. Deactivate the reserved consumed postings (saga path).
-            let deactivated = store
-                .deactivate_postings(&plan.postings_to_deactivate, Some(rid))
-                .await
-                .map_err(|e| SagaError::from(LedgerError::Store(e)))?;
-            verify_postings(
-                store,
-                &plan.postings_to_deactivate,
-                deactivated,
-                |p| p.status == PostingStatus::Inactive,
-                "deactivate",
-            )
-            .await?;
-
-            // Involved accounts to index: created owners + consumed owners. The
-            // saga supplies the set so storage computes nothing.
-            let mut involved: Vec<AccountId> =
-                plan.postings_to_create.iter().map(|p| p.owner).collect();
-            if !plan.postings_to_deactivate.is_empty() {
-                let consumed = store
-                    .get_postings(&plan.postings_to_deactivate)
-                    .await
-                    .map_err(|e| SagaError::from(LedgerError::Store(e)))?;
-                involved.extend(consumed.iter().map(|p| p.owner));
-            }
-            involved.sort();
-            involved.dedup();
-
-            // 2. Insert created postings (idempotent).
-            store
-                .insert_postings(&plan.postings_to_create)
-                .await
-                .map_err(|e| SagaError::from(LedgerError::Store(e)))?;
-
-            // 3. Persist the transfer record + account index (idempotent).
-            let envelope = ctx.envelope.as_ref().ok_or(SagaError {
-                message: "no envelope in context -- resolve step must run first".into(),
-            })?;
-            store
-                .store_transfer(
-                    EnvelopeRecord {
-                        envelope: envelope.clone(),
-                        receipt: receipt.clone(),
-                        created_at: now_millis().map_err(SagaError::from)?,
-                    },
-                    &involved,
-                )
+            // All commit work (re-validate, mark Finalizing, deactivate/insert/
+            // store/event with end-state verification) lives in `finalize_envelope`
+            // so recovery uses exactly the same path.
+            let receipt = ledger
+                .finalize_envelope(&envelope, rid)
                 .await
-                .map_err(|e| SagaError::from(LedgerError::Store(e)))?;
-
-            // 4. Append the committed event (idempotent on the transfer id).
-            store
-                .append_event(&LedgerEvent {
-                    seq: 0,
-                    timestamp: now_millis().map_err(SagaError::from)?,
-                    kind: LedgerEventKind::TransferCommitted {
-                        transfer_id: receipt.transfer_id,
-                    },
-                })
-                .await
-                .map_err(|e| SagaError::from(LedgerError::Store(e)))?;
+                .map_err(SagaError::from)?;
 
             ctx.receipts.push(receipt);
             ctx.reserved_postings.clear();

+ 10 - 2
doc/accounts.md

@@ -30,7 +30,15 @@ Each account has a policy that controls what balance constraints apply:
 
 An overdraft is represented as a **negative posting** (an offset position) assigned to the account to cover a shortfall. When an account's positive postings are insufficient for a debit, the resolve step consumes them all and creates a negative posting for the remainder. `NoOverdraft` accounts forbid this; validation rejects any transfer that would create a negative posting on a `NoOverdraft` account. `CappedOverdraft`'s floor bounds how negative the balance may go; `UncappedOverdraft`, `SystemAccount`, and `ExternalAccount` are unbounded.
 
-`CappedOverdraft`'s floor is checked during validation. Under the dumb-storage model there is no atomic re-check at commit, so the floor is **best-effort under concurrency** — two concurrent transfers could each pass validation independently but together push the balance below the floor (write-skew). Double-spend safety is unaffected: the reservation protocol (an atomic conditional `reserve_postings`) still guarantees a posting cannot be consumed twice. See [accounting-mapping.md](accounting-mapping.md) and the ADR at [adr/0001-dumb-storage-saga-recovery.md](adr/0001-dumb-storage-saga-recovery.md).
+`CappedOverdraft`'s floor is re-validated as the **last step before finalize**
+writes (the finalize step re-loads balances + account versions and re-runs
+validation just before deactivating). This is the *tightest* best-effort —
+the check-to-write window is one step, not the whole saga — but it is **not
+strictly atomic**: a concurrent commit in that last gap can still breach the
+floor (write-skew). Double-spend safety is unaffected: the reservation protocol
+(an atomic conditional `reserve_postings`) guarantees a posting cannot be
+consumed twice. See [accounting-mapping.md](accounting-mapping.md) and the ADR at
+[adr/0001-dumb-storage-saga-recovery.md](adr/0001-dumb-storage-saga-recovery.md).
 
 ## Lifecycle
 
@@ -101,4 +109,4 @@ Boundary accounts representing the outside world (banks, payment processors). Th
 
 ### Credit accounts (`CappedOverdraft`)
 
-Accounts with a negative floor (e.g. credit lines). The floor is the maximum allowed overdraft. When the account's positive postings are insufficient for a debit, a negative posting is created to cover the shortfall, down to the floor. The floor is enforced at validation time and is best-effort under concurrency (see above).
+Accounts with a negative floor (e.g. credit lines). The floor is the maximum allowed overdraft. When the account's positive postings are insufficient for a debit, a negative posting is created to cover the shortfall, down to the floor. The floor is re-validated as the last step before finalize and is best-effort under concurrency (see above).

+ 22 - 14
doc/adr/0001-dumb-storage-saga-recovery.md

@@ -50,15 +50,20 @@ Invert the design.
   → finalize). `commit_envelope(envelope)` serves pre-built/FX envelopes;
   `reverse()` uses it. `commit_atomic` is gone.
 
-- **Durable recovery via write-ahead + roll-forward.** `commit_envelope`
-  persists a `PendingSaga {envelope, reservation}` via `SagaStore` before
-  mutating anything, and deletes it on terminal. `Ledger::recover()` (startup)
-  force-completes any surviving pending saga through the idempotent primitives,
-  using the original reservation. It does **not** re-run reserve/validate (those
-  reject already-consumed postings); it converges from a crash at any point
-  (pre-reserve / reserved / mid-finalize). Because recovery is roll-forward, the
-  reservation protocol never leaves orphaned `PendingInactive` postings, so no
-  separate reconciliation pass is needed.
+- **Durable recovery via phase-tracked write-ahead + roll-forward.**
+  `commit_envelope` persists a `PendingSaga {envelope, reservation, phase}` via
+  `SagaStore` before mutating anything (`phase = Reserving`); the finalize step
+  bumps it to `Finalizing` after validation passes and just before the consumed
+  postings start turning `Inactive`. `Ledger::recover()` (startup) branches on
+  that phase: a `Reserving` saga is **re-run through the real saga** (it
+  re-reserves and **re-validates** against current state, aborting cleanly if the
+  postings were taken or an account was frozen); a `Finalizing` saga had already
+  validated and owns its postings, so it is rolled forward through the verified
+  `finalize_envelope`. `finalize_envelope` checks every end-state and never
+  creates/stores unless **all** consumed postings are confirmed `Inactive` — the
+  double-spend guard. The pending record is deleted only on commit or a clean
+  pre-finalize abort. Recovery is roll-forward, so the reservation protocol never
+  leaves orphaned `PendingInactive` postings; no reconciliation pass is needed.
 
 `legend`'s pause/resume is for external waits, not crash checkpoints, so durable
 recovery is this write-ahead layer around legend, not serialization of the
@@ -72,11 +77,14 @@ in-flight execution.
 - **Crash-safety: preserved, differently.** Not one transaction, but write-ahead
   + idempotent roll-forward. Nothing is silently lost; a crash mid-finalize is
   completed by `recover()`.
-- **Overdraft floor + freeze/close guards: now best-effort under concurrency.**
-  They are checked at validation time, not re-checked atomically at commit (the
-  `cas_guards`/`account_guards` and their commit-time re-check are removed). A
-  concurrent, unrelated balance change or a freeze/close between validation and
-  finalize has a small TOCTOU window. Accepted tradeoff for a dumb storage layer.
+- **Overdraft floor + freeze/close guards: tightest best-effort, not strictly
+  atomic.** The finalize step re-validates (re-loads balances + account versions,
+  re-runs `validate_and_plan`) as its last action before writing, so the
+  check-to-write window is one step rather than the whole saga — and this re-check
+  also runs on the recovery path. It is not strictly atomic: without folding the
+  check into the write (a CAS) or per-account serialization, a concurrent commit
+  in that last sub-step gap can still slip through. Accepted tradeoff for a dumb
+  storage layer; double-spend safety is unaffected (reservation protocol).
 - **Simpler, more testable surface.** Storage has no domain logic; all commit
   correctness lives in one place (the saga) with per-primitive count-conformance
   tests and crash-injection recovery tests.

+ 61 - 69
doc/architecture.md

@@ -97,88 +97,77 @@ The store only persists and reads — all domain logic (balance computation, val
 
 Every commit is the envelope saga. `commit(transfer)` resolves the intent into a
 concrete envelope (read-only), then runs `commit_envelope`, which persists a
-write-ahead `PendingSaga` record and drives three steps. The finalize step calls
-the dumb primitives one by one and interprets each affected-row count.
+write-ahead `PendingSaga` record (phase `Reserving`) and drives **two** steps.
+Validation lives inside the finalize step so it runs as late as possible —
+immediately before the writes.
 
 ```mermaid
 sequenceDiagram
     participant C as Caller
     participant L as Ledger
     participant R as ReserveStep
-    participant V as ValidateStep
     participant F as FinalizeStep
     participant S as Store
 
     C->>L: commit(transfer) → resolve → commit_envelope(envelope)
-    L->>S: save_saga(PendingSaga{envelope, reservation})
+    L->>S: save_saga(PendingSaga{envelope, reservation, Reserving})
     L->>R: execute
     R->>S: reserve_postings(ids, rid) → count
     Note over R: interpret count (full / partial / zero+read)
 
-    L->>V: execute
-    V->>S: get_postings, get_accounts, get_postings_by_account
-    V->>V: validate_and_plan() [pure]
-    V-->>L: Plan stored in LedgerCtx
-
-    L->>F: execute
-    F->>S: deactivate_postings(consumed, rid) → count
-    F->>S: insert_postings(created) → count
-    F->>S: store_transfer(record, involved) → count
-    F->>S: append_event(committed) → count
+    L->>F: execute (finalize_envelope)
+    F->>S: load + validate_and_plan() [last-step floor / freeze-close re-check]
+    F->>S: save_saga(... Finalizing)  [point of no return]
+    F->>S: deactivate_postings(consumed, rid) → verify all Inactive
+    F->>S: insert_postings(created) → verify exist
+    F->>S: store_transfer(record, involved) → verify transfer exists
+    F->>S: append_event(committed)
     F-->>L: Receipt
     L->>S: delete_saga(...)
     L-->>C: Receipt
 ```
 
-On in-process failure, legend compensates completed steps in LIFO order; a crash
-is handled instead by recovery (below).
-
-```mermaid
-sequenceDiagram
-    participant L as Legend
-    participant F as FinalizeStep
-    participant V as ValidateStep
-    participant R as ReserveStep
-    participant S as Store
-
-    Note over L: Step 3 fails...
-    L->>V: compensate
-    Note over V: No-op (reads only)
-    L->>R: compensate
-    R->>S: release_postings(reserved)
-    Note over S: PendingInactive → Active
-```
-
-Each step is a small, shard-local operation with automatic compensation on failure. This design avoids cross-shard transactions: no single step touches multiple shards atomically.
+On in-process failure before the point of no return, legend compensates in LIFO
+order (finalize is a no-op if nothing committed; reserve runs
+`release_postings`). Once the finalize step has marked the saga `Finalizing` and
+begun deactivating, the half-applied commit is **rolled forward** by recovery
+rather than compensated — see below.
 
 ## Durable Crash Recovery
 
-There is no single atomic transaction, so crash-safety comes from a write-ahead
-record plus idempotent roll-forward. `commit_envelope` persists a `PendingSaga
-{envelope, reservation}` via `SagaStore` **before** the saga mutates anything,
-and deletes it once the saga reaches a terminal state.
+There is no single atomic transaction, so crash-safety comes from a phase-tracked
+write-ahead record plus idempotent roll-forward. `commit_envelope` persists a
+`PendingSaga {envelope, reservation, phase}` via `SagaStore` **before** the saga
+mutates anything (`phase = Reserving`); the finalize step bumps it to
+`Finalizing` after validation passes and just before the consumed postings start
+turning `Inactive`. The record is deleted only when the transfer is committed or
+the commit was cleanly abandoned before finalize.
 
-`Ledger::recover()` (call on startup) re-completes any surviving pending saga. It
-does **not** re-run reserve/validate (those reject already-consumed postings);
-instead it force-completes the envelope through the idempotent primitives with
-the original reservation:
+`Ledger::recover()` (call on startup) re-completes any surviving pending saga,
+**branching on the persisted phase** so it never commits something that did not
+validate or consume postings it does not own:
 
 ```mermaid
-graph LR
-    A[get_transfer?] -->|exists| Z[done]
-    A -->|missing| B[reserve_postings]
-    B --> C[deactivate_postings]
-    C --> D[insert_postings]
-    D --> E[store_transfer]
-    E --> F[append_event]
-    F --> Z
+graph TD
+    A[get_transfer?] -->|exists| Z[delete record, done]
+    A -->|missing| P{phase}
+    P -->|Reserving| RR[re-run saga: reserve + finalize]
+    RR -->|re-validates; aborts cleanly if taken/frozen| Z
+    P -->|Finalizing| FF[finalize_envelope: roll forward, verified]
+    FF --> Z
 ```
 
-Because each primitive no-ops what is already done, recovery converges from a
-crash at any point — pre-reserve (postings still Active), reserved
-(PendingInactive), or mid-finalize (already Inactive). It is roll-forward, not
-rollback, so the reservation protocol never leaves orphaned `PendingInactive`
-postings for a separate reconciliation pass to clean up.
+- A **`Reserving`** saga had not necessarily validated, so recovery re-runs the
+  real saga — which re-reserves and **re-validates** against current state. If
+  the postings were taken by another transfer, or an account was frozen, it
+  aborts cleanly (nothing commits) and the record is deleted.
+- A **`Finalizing`** saga had already validated and owns its postings (it reached
+  the point of no return), so recovery rolls it forward through the verified
+  `finalize_envelope`, which checks every end-state and only creates/stores once
+  **all** consumed postings are confirmed `Inactive` — the double-spend guard.
+
+Recovery is roll-forward, not rollback, so the reservation protocol never leaves
+orphaned `PendingInactive` postings for a separate reconciliation pass.
 
 `reverse()` builds a reversal envelope and runs the same `commit_envelope` path —
 there is no separate raw/atomic entry point.
@@ -232,15 +221,19 @@ An overdraft is a **negative posting** assigned to the account to cover a shortf
 model alone — two concurrent transfers could each pass validation but together
 push the balance below the floor (write-skew).
 
-Under the dumb-storage model the floor is checked at **validation time** and is
-**best-effort under concurrency**: there is no atomic re-check at commit (the
-earlier `cas_guards`-inside-`commit_transfer` mechanism was removed with the
-atomic boundary). Double-spend safety still holds unconditionally — the
-reservation protocol (`reserve_postings` is a single atomic conditional update,
-so two sagas cannot both claim the same posting) prevents consuming a posting
-twice. What is best-effort is specifically the *floor* on a `CappedOverdraft`
-account when unrelated concurrent activity moves its balance between validation
-and finalize. This tradeoff is recorded in
+Under the dumb-storage model the floor (and the freeze/close snapshot check) is
+re-validated **as the last thing the finalize step does before it writes** — the
+finalize step re-loads balances and account versions and re-runs
+`validate_and_plan` immediately before `deactivate_postings`. This is the
+*tightest* best-effort: the check-to-write window is one step, not the whole
+saga's lifetime, and it also runs on the recovery path. It is **not strictly
+atomic**, though — without folding the check into the write itself (a CAS) or
+serializing per account, a concurrent commit landing in that last sub-step gap
+can still slip through. Double-spend safety is unaffected and holds
+unconditionally: the reservation protocol (`reserve_postings` is a single atomic
+conditional update, so two sagas cannot both claim the same posting) prevents
+consuming a posting twice. Only the *floor* on a `CappedOverdraft` account is
+best-effort. This tradeoff is recorded in
 [doc/adr/0001-dumb-storage-saga-recovery.md](adr/0001-dumb-storage-saga-recovery.md).
 
 `NoOverdraft` is fully UTXO-backed (you can only spend postings you own), and the
@@ -299,14 +292,13 @@ This enables shard-local writes: each posting reservation is an independent oper
 
 ### Internal pipeline steps
 
-The saga pipeline is built from four `legend::Step` implementations that operate on `LedgerCtx`:
+The envelope saga is two `legend::Step` implementations operating on `LedgerCtx`
+(resolution runs before the saga, in `Ledger::commit`):
 
 | Step | Execute | Compensate | Retry |
 |------|---------|------------|-------|
-| `ResolveStep` | Convert Transfer intent into concrete Envelope | No-op | No retry |
-| `ReservePostingsStep` | Batch reserve postings `Active → PendingInactive` | Release all back to `Active` | 3 retries |
-| `ValidateTransferStep` | Load accounts/balances, run `validate_and_plan()` | No-op (reads only) | No retry |
-| `FinalizeTransferStep` | `PendingInactive → Inactive`, create postings, store transfer, emit event | `reverse(transfer_id)` | 3 retries |
+| `ReservePostingsStep` | Reserve postings `Active → PendingInactive`, interpret the count | Release back to `Active` | 3 retries |
+| `FinalizeTransferStep` | `Ledger::finalize_envelope`: re-validate (last-step floor/freeze guard) → mark `Finalizing` → `deactivate` → `insert` → `store_transfer` → `append_event`, verifying every end-state | `reverse(transfer_id)` | 3 retries |
 
 ### High-level composition steps
 

+ 14 - 9
doc/crates.md

@@ -92,16 +92,18 @@ Async resource layer. Depends on `kuatia-core`, `tokio`, `async-trait`, `serde`,
 #### Commit (the envelope saga)
 
 `commit(transfer)` resolves the intent into an envelope (read-only) then runs the
-`EnvelopeSaga` (defined via `legend!`) — three steps with automatic retry and
-LIFO compensation. Finalize calls the dumb primitives one by one and interprets
-each affected-row count:
+`EnvelopeSaga` (defined via `legend!`) — **two steps** with automatic retry and
+LIFO compensation. The finalize step re-validates as its last action before the
+writes, then calls the dumb primitives, interpreting/verifying each count:
 
 ```mermaid
 graph LR
-    A[resolve] -->|Envelope| W[save PendingSaga]
+    A[resolve] -->|Envelope| W[save PendingSaga: Reserving]
     W --> B[reserve_postings]
-    B -->|Active→PendingInactive| C[validate_and_plan]
-    C -->|Plan| D[deactivate → insert → store_transfer → append_event]
+    B -->|Active→PendingInactive| F[finalize]
+    F --> V[validate_and_plan re-check]
+    V --> M[mark Finalizing]
+    M --> D[deactivate → insert → store_transfer → append_event]
     D --> E[Receipt + delete PendingSaga]
     style E fill:#e8f5e9
 ```
@@ -118,7 +120,7 @@ through the idempotent primitives (roll-forward). Call it on startup.
 | Method | Description |
 |--------|-------------|
 | `commit(transfer)` | Resolve intent → `commit_envelope` (requires `Arc<Ledger>`) |
-| `commit_envelope(envelope)` | The one commit path: write-ahead → reserve → validate → finalize (for pre-built/FX envelopes) |
+| `commit_envelope(envelope)` | The one commit path: write-ahead → reserve → finalize (finalize re-validates, then writes); for pre-built/FX envelopes |
 | `reverse(transfer_id)` | Builds a compensating envelope and runs `commit_envelope` |
 | `recover()` | Force-completes pending sagas after a crash (call on startup) |
 
@@ -248,8 +250,11 @@ the saga derives meaning from them.
 | Step | Execute | Compensate | Retry |
 |------|---------|------------|-------|
 | `ReservePostingsStep` | `reserve_postings` `Active → PendingInactive`, interpret count | Release back to `Active` | 3 |
-| `ValidateTransferStep` | Load state, `validate_and_plan()` | No-op | None |
-| `FinalizeTransferStep` | `deactivate_postings` → `insert_postings` → `store_transfer` → `append_event` | `reverse(transfer_id)` | 3 |
+| `FinalizeTransferStep` | `Ledger::finalize_envelope`: re-validate (last-step floor/freeze guard) → mark `Finalizing` → `deactivate` → `insert` → `store_transfer` → `append_event`, verifying every end-state | `reverse(transfer_id)` | 3 |
+
+Validation lives inside the finalize step so it runs immediately before the
+writes. Recovery (`recover()`) re-uses `finalize_envelope` for `Finalizing`
+sagas and re-runs the whole saga for `Reserving` ones.
 
 #### High-level steps (for custom saga composition with `legend!`)
 

+ 1 - 1
doc/glossary.md

@@ -45,7 +45,7 @@ The concurrency-control mechanism for consumed postings: `reserve_postings` atom
 
 ### PendingSaga / recovery
 
-A write-ahead record `{envelope, reservation}` persisted via `SagaStore` before a commit mutates anything. `Ledger::recover()` (startup) force-completes any pending saga through the idempotent primitives — roll-forward, converging from a crash at any point.
+A write-ahead record `{envelope, reservation, phase}` persisted via `SagaStore` before a commit mutates anything. The `phase` (`Reserving` → `Finalizing`) tells `Ledger::recover()` (startup) how to complete a crashed saga: a `Reserving` saga is re-run and **re-validated**; a `Finalizing` saga (already validated, owns its postings) is rolled forward through the verified `finalize_envelope`. Roll-forward, not rollback.
 
 ### Book
 

+ 14 - 10
doc/transfers.md

@@ -157,16 +157,17 @@ A single transfer can contain multiple movements of different types. All movemen
 ### Saga commit (default)
 
 ```
-Transfer → resolve → Envelope → reserve → validate → finalize → Receipt
+Transfer → resolve → Envelope → reserve → finalize(validate → write) → Receipt
 ```
 
-Resolution is read-only; `commit(transfer)` resolves then runs the envelope saga
-(reserve → validate → finalize) with automatic retry and LIFO compensation.
+Resolution is read-only; `commit(transfer)` resolves then runs the two-step
+envelope saga (reserve → finalize) with automatic retry and LIFO compensation.
+Validation runs inside the finalize step, immediately before the writes.
 
 ### Committing a pre-built envelope
 
 ```
-Envelope → reserve → validate → finalize → Receipt
+Envelope → reserve → finalize(validate → write) → Receipt
 ```
 
 `ledger.commit_envelope(envelope)` runs the same saga for an envelope you already
@@ -196,11 +197,14 @@ Every envelope passes through `validate_and_plan()` before being applied. The va
 9. Negative postings forbidden only on `NoOverdraft` accounts (allowed on overdraft/system/external)
 10. Policy enforcement: projected balance satisfies account floor
 
-After validation, the finalize step applies the effects through a sequence of
-dumb, idempotent store primitives (`deactivate_postings` → `insert_postings` →
-`store_transfer` → `append_event`). There is no single transaction; crash-safety
-comes from a write-ahead `PendingSaga` record plus `recover()` roll-forward. The
-`CappedOverdraft` floor is checked in validation (step 10) and is best-effort
-under concurrency — see [architecture.md](architecture.md).
+Validation runs inside the finalize step, immediately before it writes (the
+last-step floor / freeze-close re-check). The finalize step then applies the
+effects through a sequence of dumb, idempotent store primitives
+(`deactivate_postings` → `insert_postings` → `store_transfer` → `append_event`),
+verifying every end-state. There is no single transaction; crash-safety comes
+from a phase-tracked write-ahead `PendingSaga` record plus `recover()`
+roll-forward. The `CappedOverdraft` floor is re-checked as that last step and is
+best-effort (not strictly atomic) under concurrency — see
+[architecture.md](architecture.md).
 
 See [architecture.md](architecture.md) for details on each check.