fix: resolve SQLite connection pool timeout after idle period by hedypamungkas · Pull Request #3148 · tailcallhq/forgecode

hedypamungkas · 2026-04-24T09:25:32Z

Summary

Fix the Failed to get connection from pool: timed out waiting for connection error that occurs when a user resumes a Forge conversation after several hours of idle time, by adding connection health validation, removing unnecessary warm connections, and implementing pool self-healing.

Context

When a user leaves a Forge terminal idle for hours and then resumes, the first database operation fails with a connection pool timeout. Opening a new terminal works fine because it creates a fresh DatabasePool with no stale connections.

Root Cause

The issue is caused by the interaction between idle_timeout, min_idle, and the lack of connection health validation:

idle_timeout: 600s — After 10 minutes idle, r2d2 evicts connections from the pool
min_idle: Some(1) — Pool tries to maintain at least 1 idle connection, creating new ones after eviction — these can become stale
No health check on acquire — on_acquire only runs PRAGMAs, never validates existing connections are alive
connection_timeout: 5s — Too aggressive; if a recreated connection is stale, checkout fails quickly

After hours of idle, the pool has cycled through many create/evict cycles. The SQLite WAL file may have been modified by other Forge processes, or OS-level resource cleanup may have invalidated the connection. When the user resumes, the first checkout hits this stale connection and times out.

Distinction from PR #3033

PR #3033 (already merged) fixes concurrent write contention by moving SQLite ops to spawn_blocking. This PR addresses a different scenario: a single user resuming after idle — no concurrent writes involved.

Changes

Phase 1: Connection Health Check on Acquire

Change	File	Rationale
Add `SELECT 1` health check in `on_acquire`	`pool.rs:164-170`	Catches stale connections at checkout time before they propagate as errors
Change `min_idle: Some(1)` → `min_idle: None`	`pool.rs:33`	For a CLI tool idle for hours, maintaining warm connections is counterproductive
Increase `connection_timeout: 5s` → `15s`	`pool.rs:34`	Gives adequate time for fresh connection creation without noticeable user delay
Add `PRAGMA wal_checkpoint(TRUNCATE)`	`pool.rs:184-189`	Ensures clean WAL state after long idle periods

Phase 2: Pool Self-Healing (Safety Net)

Change	File	Rationale
Add `recreate_pool()` method	`pool.rs:112-132`	Rebuilds the pool from scratch as a last-resort recovery mechanism
Self-healing `get_connection()`	`pool.rs:78-105`	After all retries fail, attempts pool recreation before one final checkout

Key Implementation Details

DatabasePool now wraps the pool in Mutex<DbPool> and stores PoolConfig to enable safe pool recreation
Pool recreation reuses build_pool() which re-runs migrations, ensuring a fully initialized fresh pool
SELECT 1 on SQLite is sub-millisecond — the latency trade-off is negligible for guaranteed connection validity
The Mutex is only held during connection checkout, not during the entire operation, so concurrent access is not blocked

Testing

All 295 tests pass (including 5 new tests):

# Run all tests
cargo test -p forge_repo

# Run just the new pool tests
cargo test -p forge_repo -- pool::tests

New Tests Added

Test	What It Verifies
`test_idle_eviction_recovery`	Pool with 100ms idle timeout recovers after eviction by creating fresh connections
`test_health_check_on_acquire`	5 consecutive `get_connection()` calls all succeed with `SELECT 1` validation
`test_pool_config_defaults`	Asserts new defaults: `min_idle: None`, `connection_timeout: 15s`
`test_pool_recreation_after_simulated_failure`	`recreate_pool()` works correctly on a file-based database
`test_wal_checkpoint_on_acquire`	`PRAGMA wal_checkpoint(TRUNCATE)` doesn't error on in-memory DBs

Verification

cargo check -p forge_repo — passed
cargo clippy -p forge_repo -- -D warnings — 0 warnings
cargo test -p forge_repo — 295 passed, 0 failed
cargo insta test -p forge_repo --accept — 295 passed, no snapshot changes

Risks and Mitigations

Risk	Mitigation
Health check adds latency on every checkout	`SELECT 1` on SQLite is sub-millisecond
Removing `min_idle` means first query after idle is slightly slower	SQLite connection creation is fast (< 50ms), far better than a 5s timeout failure
Pool recreation could lose in-flight operations	Only happens after all retries fail, meaning no operations are in-flight
WAL checkpoint could block if another process holds the lock	`busy_timeout = 30000` ensures SQLite waits up to 30s for locks

- Add SELECT 1 health check in SqliteCustomizer::on_acquire to catch stale connections - Remove min_idle: Some(1) to avoid keeping stale connections alive during idle - Increase connection_timeout from 5s to 15s for adequate fresh connection creation - Add PRAGMA wal_checkpoint(TRUNCATE) to ensure clean WAL state after idle - Add recreate_pool() method for last-resort pool recovery - Modify get_connection() to attempt pool recreation after all retries fail - Add 5 new tests covering idle eviction, health check, pool recreation, and WAL checkpoint Co-Authored-By: ForgeCode <noreply@forgecode.dev>

CLAassistant · 2026-04-24T09:25:41Z

All committers have signed the CLA.

amitksingh1490 · 2026-04-26T03:29:27Z

@hedypamungkas This is causing significant performance degradation in bootup can you please check?

github-actions · 2026-05-10T05:15:27Z

Action required: PR inactive for 5 days.
Status update or closure in 10 days.

…/hedypamungkas/forgecode into pr/3148

github-actions Bot added the type: fix Iterations on existing features or infrastructure. label Apr 24, 2026

hedypamungkas and others added 5 commits April 24, 2026 16:27

Merge branch 'main' into fix/connection-pool-timeout-idle

479374a

[autofix.ci] apply automated fixes

7a5c682

Merge branch 'main' into fix/connection-pool-timeout-idle

064e00e

Merge branch 'main' into fix/connection-pool-timeout-idle

43f6d0e

Merge branch 'main' into fix/connection-pool-timeout-idle

f6dbe07

Merge branch 'main' into fix/connection-pool-timeout-idle

209451a

github-actions Bot added the state: inactive No current action needed/possible; issue fixed, out of scope, or superseded. label May 10, 2026

Merge branch 'main' into fix/connection-pool-timeout-idle

3531c8b

github-actions Bot removed the state: inactive No current action needed/possible; issue fixed, out of scope, or superseded. label May 10, 2026

laststylebender14 self-assigned this May 11, 2026

laststylebender14 and others added 4 commits May 11, 2026 18:07

refactor(db): consolidate PRAGMA statements and remove health check

03bf41e

Merge branch 'main' into fix/connection-pool-timeout-idle

21a2121

Merge branch 'fix/connection-pool-timeout-idle' of https://github.com…

82259f7

…/hedypamungkas/forgecode into pr/3148

[autofix.ci] apply automated fixes

a2c17a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve SQLite connection pool timeout after idle period#3148

fix: resolve SQLite connection pool timeout after idle period#3148
hedypamungkas wants to merge 12 commits intotailcallhq:mainfrom
hedypamungkas:fix/connection-pool-timeout-idle

hedypamungkas commented Apr 24, 2026

Uh oh!

CLAassistant commented Apr 24, 2026 •

edited

Loading

Uh oh!

amitksingh1490 commented Apr 26, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hedypamungkas commented Apr 24, 2026

Summary

Context

Root Cause

Distinction from PR #3033

Changes

Phase 1: Connection Health Check on Acquire

Phase 2: Pool Self-Healing (Safety Net)

Key Implementation Details

Testing

New Tests Added

Verification

Risks and Mitigations

Uh oh!

CLAassistant commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amitksingh1490 commented Apr 26, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Apr 24, 2026 •

edited

Loading