Jelajahi Sumber

fix: race condition in connection pool causing "no such table" errors

This fixes a race condition in the connection pool that caused
  intermittent "no such table: mint_quote_payments" errors, particularly
  on slower CI runners.

  ## Root Cause

  The `get_timeout()` function in `pool.rs` had a TOCTOU (time-of-check-
  time-of-use) race condition. When acquiring a connection:

  1. Thread A pops a connection from the queue
  2. Thread A releases the mutex BEFORE incrementing `in_use` counter
  3. Thread B acquires the mutex, sees empty queue, releases mutex
  4. Thread B checks `in_use < max_size` (0 < 1) - passes because
     Thread A hasn't incremented yet
  5. Thread B creates a NEW connection

  For in-memory SQLite, each `Connection::open_in_memory()` creates a
  SEPARATE, INDEPENDENT database. Migrations only run on the first
  connection created during `SQLMintDatabase::new()`. When the race
  condition triggers, additional connections point to fresh databases
  without any tables - hence "no such table: mint_quote_payments".

  ## The Fix

  Move `increment_connection_counter()` to BEFORE releasing the mutex
  in both code paths:

  1. When popping an existing resource from the queue
  2. When creating a new resource

  This ensures other threads immediately see the updated count and wait
  instead of creating duplicate connections. Added error handling to
  decrement the counter if resource creation fails, preserving the
  behavior from PR #1367.

  ## Why CI-only failures

  The race window is very small (between mutex release and counter
  increment). On faster local machines, threads rarely hit this window.
  On slower CI runners with more context switching, the probability of
  two threads interleaving in the problematic order increases
  significantly.
thesimplekid 1 bulan lalu
induk
melakukan
3ddfd30343
1 mengubah file dengan 24 tambahan dan 10 penghapusan
  1. 24 10
      crates/cdk-sql-common/src/pool.rs

+ 24 - 10
crates/cdk-sql-common/src/pool.rs

@@ -194,8 +194,12 @@ where
         loop {
             if let Some((stale, resource)) = resources.pop() {
                 if !stale.load(Ordering::SeqCst) {
-                    drop(resources);
+                    // Increment counter BEFORE releasing the mutex to prevent race condition
+                    // where another thread sees in_use < max_size and creates a duplicate connection.
+                    // For in-memory SQLite, each connection is a separate database, so this race
+                    // would cause some connections to miss migrations.
                     self.increment_connection_counter();
+                    drop(resources);
 
                     return Ok(PooledResource {
                         resource: Some((stale, resource)),
@@ -207,17 +211,27 @@ where
             }
 
             if self.in_use.load(Ordering::Relaxed) < self.max_size {
+                // Increment counter BEFORE releasing the mutex to prevent race condition.
+                // This ensures other threads see the updated count and wait instead of
+                // creating additional connections beyond max_size.
+                self.increment_connection_counter();
                 drop(resources);
                 let stale: Arc<AtomicBool> = Arc::new(false.into());
-                let new_resource = RM::new_resource(&self.config, stale.clone(), timeout)?;
-                self.increment_connection_counter();
-
-                return Ok(PooledResource {
-                    resource: Some((stale, new_resource)),
-                    pool: self.clone(),
-                    #[cfg(feature = "prometheus")]
-                    start_time: Instant::now(),
-                });
+                match RM::new_resource(&self.config, stale.clone(), timeout) {
+                    Ok(new_resource) => {
+                        return Ok(PooledResource {
+                            resource: Some((stale, new_resource)),
+                            pool: self.clone(),
+                            #[cfg(feature = "prometheus")]
+                            start_time: Instant::now(),
+                        });
+                    }
+                    Err(e) => {
+                        // If resource creation fails, decrement the counter we just incremented
+                        self.in_use.fetch_sub(1, Ordering::AcqRel);
+                        return Err(e);
+                    }
+                }
             }
 
             resources = self