Fail-open cache when persist backend is unreachable (#50) by poxet · Pull Request #51 · Tharga/Cache

poxet · 2026-06-15T14:04:42Z

Closes #50.

Problem

When the persist backend (Redis) times out commands rather than cleanly failing to connect, ICache.GetAsync<T>(key, fetch) threw instead of falling back to the fetch loader. A production Redis outage therefore took the whole service down (every read retried ~15s then threw, blocked threads exhausted the pool at ~60k queued items) even though the backing store was healthy.

Changes

1. Fail-open in CacheBase (provider-agnostic). A backend exception is caught and:

reads → logged, treated as a miss → control flows to the fetch source loader (GetCoreAsync, PeekAsync, BuyMoreTime);
writes → logged and swallowed, never fault the caller (FetchCallback, SetCoreAsync).

Gated by an exception filter when (_options.FailOpenOnBackendError), so setting the new CacheOptions.FailOpenOnBackendError = false restores the previous throwing behavior exactly. Because it lives in the base class it protects all IPersist backends (Redis/MongoDB/File). CacheBase gained a nullable ILogger, threaded through the 5 cache subclasses and the DI factory lambdas.

2. Circuit breaker in the Redis provider. New internal RedisResiliencePolicy factory builds a Polly circuit breaker (outer) wrapping the existing retry (inner), so once the circuit is open calls fail fast (BrokenCircuitException, caught by the core fail-open) instead of paying retry latency per call — the fix for the thread-pool starvation. Half-open auto-recovers. CanConnectAsync returns (false, "circuit open") instead of throwing.

3. Options. CacheOptions.FailOpenOnBackendError (default true); RedisCacheOptions.RetryCount / CircuitBreakerFailureThreshold / CircuitBreakerDuration / CommandTimeout.

Acceptance criteria

With the backend down/timing out, GetAsync(key, fetch) returns fetch()'s result and does not throw.
A cache write failure does not fault the caller.
Under a sustained outage, calls fail fast once the breaker is open and recover automatically.

Tests

FailOpenTests (4) — throwing IPersist: Get returns the loader result; read/write failures don't fault; FailOpenOnBackendError=false re-throws.
RedisResiliencePolicyTests (3) — breaker opens after the threshold and fast-fails without invoking the backend; success passes through; transient failures still retry.
Full solution builds clean. Core: 478 pass (4 new). Redis: 5 pass.

Docs: README updated (core feature bullet + Redis "Resilience (fail-open)" section).

A persist-backend exception is no longer propagated to the caller: reads are treated as a miss and fall through to the source loader, and writes are logged and swallowed. A cache outage therefore never faults the application as long as the source of truth is healthy. Gated by the new CacheOptions.FailOpenOnBackendError (default true); set false to restore the previous throwing behavior. Lives in CacheBase, so it covers every IPersist backend (Redis/MongoDB/File). Adds a Polly circuit breaker (outer) around the Redis retry (inner) so a sustained outage short-circuits immediately instead of paying retry latency on every call -- which is what caused the thread-pool starvation in the reported incident. New RedisCacheOptions: RetryCount, CircuitBreakerFailureThreshold, CircuitBreakerDuration, CommandTimeout. Tests: FailOpenTests (4) + RedisResiliencePolicyTests (3). Closes #50

codecov · 2026-06-15T14:08:08Z

Codecov Report

❌ Patch coverage is 35.33835% with 86 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
Tharga.Cache.Tests/FailOpenTests.cs	0.00%	48 Missing ⚠️
Tharga.Cache.Redis/Redis.cs	0.00%	22 Missing ⚠️
Tharga.Cache/Core/CacheBase.cs	80.76%	4 Missing and 1 partial ⚠️
Tharga.Cache.Redis/RedisResiliencePolicy.cs	80.95%	2 Missing and 2 partials ⚠️
Tharga.Cache/CacheRegistrationExtensions.cs	20.00%	3 Missing and 1 partial ⚠️
Tharga.Cache.Redis/RedisCacheOptions.cs	75.00%	1 Missing ⚠️
Tharga.Cache/Core/GenericCache.cs	0.00%	1 Missing ⚠️
Tharga.Cache/Core/GenericTimeCache.cs	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-06-15T14:15:41Z

Released as v0.4.8 — https://github.com/Tharga/Cache/releases/tag/0.4.8

poxet requested a deployment to prerelease June 15, 2026 14:08 — with GitHub Actions Waiting

poxet merged commit d20c9bb into master Jun 15, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail-open cache when persist backend is unreachable (#50)#51

Fail-open cache when persist backend is unreachable (#50)#51
poxet merged 1 commit into
masterfrom
feature/issue-50-fail-open-cache

poxet commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poxet commented Jun 15, 2026

Problem

Changes

Acceptance criteria

Tests

Uh oh!

codecov Bot commented Jun 15, 2026

Codecov Report

Uh oh!

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant