Skip to content

Add replay-aware logger to Amazon.Lambda.DurableExecution#2371

Draft
GarrettBeatty wants to merge 12 commits into
GarrettBeatty/stack/3from
GarrettBeatty/stack/4
Draft

Add replay-aware logger to Amazon.Lambda.DurableExecution#2371
GarrettBeatty wants to merge 12 commits into
GarrettBeatty/stack/3from
GarrettBeatty/stack/4

Conversation

@GarrettBeatty
Copy link
Copy Markdown
Contributor

@GarrettBeatty GarrettBeatty commented May 14, 2026

#2216


What

Implements context.Logger, the replay-aware ILogger described in Docs/durable-execution-design.md and shipped by the Python / Java / JavaScript reference SDKs.

Public API surface introduced:

Type Purpose
IDurableContext.Logger Replay-safe ILogger (was NullLogger.Instance).
IDurableContext.ConfigureLogger(LoggerConfig) Swap the inner logger and/or disable replay-aware filtering.
LoggerConfig CustomLogger + ModeAware configuration record.

Why

Without replay-aware logging, every Console.WriteLine (or any non-suppressing logger) repeats on every replay invocation. A 30-step workflow re-invoked 30 times produces 30 copies of every log line — noisy at best, misleading at worst. The reference SDKs all solve this by reading replay state on each log call and suppressing emission while the workflow is re-deriving prior operations from checkpoint state. This PR ports that behavior to .NET on top of the per-operation replay tracker introduced in #2360.

How

ReplayAwareLogger. An ILogger decorator that consults ExecutionState.IsReplaying on every call. Short-circuits both Log<TState> and IsEnabled during replay so LoggerExtensions.LogXxx doesn't even format the message string. BeginScope always passes through so the scope stack stays balanced — suppression only applies at log emission.

Default inner logger. LambdaCoreLogger — a minimal in-package adapter that delegates to Amazon.Lambda.Core.LambdaLogger.Log, so logs flow into the standard Lambda runtime pipeline (JSON when AWS_LAMBDA_LOG_FORMAT=JSON, level-filtered by AWS_LAMBDA_LOG_LEVEL). Two structured-logging behaviors:

  • When state is the FormattedLogValues produced by LoggerExtensions.LogXxx, the original template and named-argument values are forwarded so the runtime's JSON formatter surfaces {OrderId}-style placeholders as top-level structured attributes.
  • BeginScope maintains an AsyncLocal chain of scope state. KVP-shaped scope state is appended to the outgoing template as named placeholders (inner→outer order, inner wins on key collision; explicit message args win over scope keys), so durableExecutionArn / operationId / etc. show up as top-level JSON fields without callers having to swap in a third-party logger.

Mirrors the structured-logging pattern in Amazon.Lambda.Logging.AspNetCore.LambdaILogger. Avoids forcing a dependency on Amazon.Lambda.Logging.AspNetCore. Users who want Serilog/Powertools/etc. swap their own logger via ConfigureLogger.

Metadata scopes. DurableFunction.WrapAsyncCore opens a BeginScope around the workflow body carrying durableExecutionArn + awsRequestId. StepOperation opens a per-step scope (operationId, operationName, attempt) around the user-func invocation only. Combined with the scope-aware default logger above, every log line emitted by user code is automatically tagged with execution / step metadata.

Key files:

  • LoggerConfig.cs — public configuration type
  • Internal/ReplayAwareLogger.cs — the replay-aware decorator
  • Internal/LambdaCoreLogger.cs — default inner logger; preserves structured args + flattens scope chain
  • DurableContext.cs — replaces NullLogger default; implements ConfigureLogger
  • DurableFunction.cs — execution-level scope
  • Internal/StepOperation.cs — step-level scope around user func

Testing

Unit tests (21 new in Amazon.Lambda.DurableExecution.Tests):

  • ReplayAwareLoggerTests (7) — replay suppression, execution passthrough, ModeAware=false, IsEnabled short-circuit, BeginScope passthrough, mid-workflow REPLAY → NEW transition (mirrors Python's test_logger_replay_then_new_logging).
  • DurableContextTests (3) — Logger_Default_IsReplayAwareLogger, ConfigureLogger_WithCustomLogger_ReachesUserLogger, ConfigureLogger_ModeAwareFalse_LogsDuringReplay.
  • LambdaCoreLoggerTests (11) — installs capture delegates into LambdaLogger._loggingWithLevelAction (the same hook RuntimeSupport uses) and asserts named placeholders + arg values are forwarded intact, exception variant works, plain messages pass through as literals, non-FormattedLogValues state falls back to formatter(state, exception), KVP scopes are appended, nested scopes flatten inner→outer with inner winning on collision, explicit message args win over scope keys, scope is popped on dispose, AsyncLocal isolates concurrent tasks, and non-KVP scopes are ignored.

Integration test (ReplayAwareLoggerTest in Amazon.Lambda.DurableExecution.IntegrationTests):

End-to-end proof on real AWS infra. Deploys a step → wait(3s) → step workflow that pairs each context.Logger.LogInformation line with a Console.WriteLine "control" line; the test function runs with AWS_LAMBDA_LOG_FORMAT=JSON. After the durable execution completes (across two invocations driven by the wait), queries CloudWatch Logs and asserts:

  • Each replay-aware line appears exactly once across both invocations.
  • Each control line appears once per invocation that reached it (proving the function genuinely replayed).
  • Parsed JSON log records carry the expected scope-derived top-level fields: durableExecutionArn + awsRequestId on workflow-level lines; additionally operationId + operationName + attempt on lines emitted inside a step delegate.

This pins both the replay-suppression contract and the structured-scope contract end-to-end against the actual durable-execution service.

Out of scope (follow-up PRs)

  • MapAsync / ParallelAsync / RunInChildContextAsync / WaitForConditionAsync
  • CallbackAsync, InvokeAsync
  • DefaultJsonCheckpointSerializer
  • Annotations source-generator integration / [DurableExecution] attribute
  • DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package
  • dotnet new lambda.DurableFunction blueprint


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
GarrettBeatty and others added 9 commits May 15, 2026 18:14
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution
SDK: a workflow can run StepAsync and WaitAsync against a real Lambda,
with replay-aware checkpointing wired through to the AWS service.

Public API surface introduced:
- DurableFunction.WrapAsync — entry point that handles the durable
  execution envelope (input hydration, output construction, status mapping)
- IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait)
- StepConfig with serializer hook (retry deferred to follow-up PR)
- ICheckpointSerializer interface
- [DurableExecution] attribute (recognized by future source generator)
- DurableExecutionException base + StepException

Internals:
- DurableExecutionHandler — Task.WhenAny race between user code and
  the suspension signal, returning Succeeded/Failed/Pending
- ExecutionState — replay-aware operation lookup and pending checkpoint
  buffer
- OperationIdGenerator — deterministic, replay-stable IDs
- TerminationManager — TaskCompletionSource-based suspension trigger
- LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and
  state APIs

Tests:
- 86 unit tests covering enums, exceptions, models, configs,
  ID generation, termination, execution state, the handler race,
  the context (Step + Wait paths), and the WrapAsync entry point
- 8 end-to-end integration tests deploying real Lambdas via Docker on
  the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly,
  LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails

Out of scope (follow-up PRs):
- IRetryStrategy, ExponentialRetryStrategy, retry decision factories
- DefaultJsonCheckpointSerializer
- DurableLogger replay-suppression (currently returns NullLogger)
- Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync,
  WaitForConditionAsync — interface intentionally does not declare them
- Annotations source-generator integration
- DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package
- dotnet new lambda.DurableFunction blueprint

stack-info: PR: #2360, branch: GarrettBeatty/stack/2

remove

update

update

update

update
Match the Python / Java / JavaScript reference SDKs' replay-mode model:
the workflow is "replaying" iff it has not yet revisited every
checkpointed completed user-replayable operation. A single global flag
flipped on the first fresh op (the prior model) misclassified workflow-
body code that runs before the first step and would not generalize to
Map/Parallel/Callback later.

ExecutionState changes:
- Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying`
  + `TrackReplay(operationId)`.
- Initial replay decision: any non-EXECUTION op present means we're
  replaying. The service always sends an EXECUTION-type op carrying the
  input payload — that's bookkeeping, not user history, so it does not
  count toward replay (matches Python execution.py:258, Java
  ExecutionManager:81, JS execution-context.ts:62).
- TrackReplay flips IsReplaying false once every checkpointed terminal-
  status non-EXECUTION op has been visited. Terminal set matches
  Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED.

Operation changes:
- DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the
  top, so every operation participates in visit accounting without each
  subclass needing to remember.
- StepOperation/WaitOperation drop their manual EnterExecutionMode calls.

Tests:
- ExecutionStateTests rewritten around IsReplaying/TrackReplay, including
  pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒
  flips out of replay, PENDING ops do not block transition, idempotency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer

DurableExecution now reads the registered ILambdaSerializer from the per-invocation
ILambdaContext (added in the prior PR) for both step-result checkpointing and
workflow input/output. AOT-safety is now determined entirely by which serializer
the user registers with LambdaBootstrapBuilder.Create — there is no longer a
forked path between reflection-based and AOT-safe APIs.

Removed:
- ICheckpointSerializer<T> + SerializationContext record
- ReflectionJsonCheckpointSerializer<T>
- The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync
- The IDurableContext.StepAsync overload that took ICheckpointSerializer<T>
- All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their
  related [UnconditionalSuppressMessage] shims

Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim
attributes in the public API. The AOT smoke test continues to publish with zero
IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
  durable-execution context (which call, which ARN). User logs no longer show
  bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
  inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
  root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
  Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
  netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.

Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
stack-info: PR: #2363, branch: GarrettBeatty/stack/3
@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/3 branch from 7ca2099 to 5a29b3e Compare May 17, 2026 20:16
@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/4 branch from f74f35a to 0ad914a Compare May 18, 2026 01:22
Implement context.Logger, the replay-aware ILogger described in
Docs/durable-execution-design.md and shipped by the Python / Java / JS
reference SDKs. Messages emitted while the workflow is replaying prior
operations are suppressed, so a 30-step workflow re-invoked 30 times
emits each LogInformation line once instead of 30 times.

Public API:
- IDurableContext.Logger — was NullLogger.Instance, now a replay-safe
  ILogger backed by Amazon.Lambda.Core.LambdaLogger so logs flow into
  the standard runtime pipeline (JSON when AWS_LAMBDA_LOG_FORMAT=JSON,
  level-filtered by AWS_LAMBDA_LOG_LEVEL).
- IDurableContext.ConfigureLogger(LoggerConfig) — swap the inner
  ILogger (Serilog, Powertools, etc.) and/or disable replay-aware
  filtering (ModeAware = false) for debugging. Matches the API shape
  documented in the design doc.

Internals:
- ReplayAwareLogger — ILogger decorator that consults
  ExecutionState.IsReplaying on every Log call. Short-circuits both
  Log<TState> and IsEnabled during replay so LoggerExtensions.LogXxx
  doesn't even format the string. BeginScope always passes through so
  the scope stack stays balanced.
- LambdaCoreLogger — minimal in-package adapter that delegates to
  Amazon.Lambda.Core.LambdaLogger.Log. Avoids forcing a dependency on
  Amazon.Lambda.Logging.AspNetCore.
- DurableFunction.WrapAsyncCore opens a BeginScope around the workflow
  body carrying durableExecutionArn + awsRequestId. StepOperation
  opens a per-step scope (operationId, operationName, attempt) around
  the user-func invocation only. Structured log providers (the
  runtime's JSON formatter, Serilog, etc.) tag every log line emitted
  by user code with that metadata automatically.

Tests:
- ReplayAwareLoggerTests — 7 unit tests: replay suppression, execution
  passthrough, ModeAware=false, IsEnabled short-circuit, scope
  passthrough, mid-workflow REPLAY→NEW transition (mirrors Python's
  test_logger_replay_then_new_logging).
- DurableContextTests — coverage for the default logger, ConfigureLogger
  with a custom logger, and ConfigureLogger { ModeAware = false }
  enabling logs during replay.
- ReplayAwareLoggerTest (integration) — deploys a Step → Wait → Step
  workflow that pairs each context.Logger.LogInformation line with a
  Console.WriteLine "control" line. After the durable execution
  completes, queries CloudWatch Logs and asserts each replay-aware
  line appears exactly once across both invocations while each control
  line appears once per invocation, proving the suppression works
  end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a replay-aware ILogger implementation to Amazon.Lambda.DurableExecution so workflow logs don’t duplicate during replay, and exposes a small configuration surface for swapping the underlying logger and toggling replay filtering.

Changes:

  • Introduces ReplayAwareLogger + LambdaCoreLogger, wires IDurableContext.Logger to be replay-aware by default, and adds IDurableContext.ConfigureLogger(LoggerConfig).
  • Adds execution- and step-level BeginScope metadata for structured loggers.
  • Adds unit tests and a CloudWatch-based integration test to validate replay suppression end-to-end.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs Updates public context API with replay-safe Logger docs and new ConfigureLogger(LoggerConfig) method.
Libraries/src/Amazon.Lambda.DurableExecution/LoggerConfig.cs Adds public configuration type for swapping inner logger and toggling replay-aware suppression.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ReplayAwareLogger.cs Adds replay-suppressing ILogger decorator driven by ExecutionState.IsReplaying.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/LambdaCoreLogger.cs Adds default in-package logger adapter that routes to Amazon.Lambda.Core.LambdaLogger.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Defaults Logger to replay-aware logger and implements ConfigureLogger.
Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs Adds execution-level logging scope for structured metadata.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/StepOperation.cs Adds step-level logging scope for operation metadata around user step invocation.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/Internal/ReplayAwareLoggerTests.cs Adds unit tests for replay suppression, scope passthrough, and mode transitions.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableContextTests.cs Adds unit tests for default logger type and ConfigureLogger behavior.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/Amazon.Lambda.DurableExecution.IntegrationTests.csproj Adds CloudWatch Logs SDK dependency for log verification.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ReplayAwareLoggerTest.cs Adds CloudWatch-based integration test validating replay suppression vs Console control lines.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/ReplayAwareLoggerFunction.csproj Adds new integration-test Lambda function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/Function.cs Implements Step→Wait→Step workflow emitting replay-aware and control log markers.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/Dockerfile Adds container packaging for the new integration-test function.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// expected count of every marker so the test never short-circuits with
// a still-arriving "after_step1" record (which is emitted at a
// different timestamp than workflow_start and indexes independently).
using var logs = new AmazonCloudWatchLogsClient(RegionEndpoint.USEast1);
Comment on lines +32 to +33
// Level filtering is performed by the runtime layer (AWS_LAMBDA_LOG_LEVEL).
public bool IsEnabled(LogLevel logLevel) => logLevel != LogLevel.None;
Comment on lines +44 to +54
var message = formatter(state, exception);
var levelName = logLevel.ToString();

if (exception != null)
{
CoreLambdaLogger.Log(levelName, exception, message);
}
else
{
CoreLambdaLogger.Log(levelName, message);
}
Comment on lines +117 to +121
using (context.Logger.BeginScope(new Dictionary<string, object>
{
["durableExecutionArn"] = invocationInput.DurableExecutionArn,
["awsRequestId"] = lambdaContext.AwsRequestId ?? string.Empty,
}))
Comment on lines +203 to +207
using (_logger.BeginScope(new Dictionary<string, object>
{
["operationId"] = OperationId,
["operationName"] = Name ?? string.Empty,
["attempt"] = attemptNumber,
When state is FormattedLogValues, extract {OriginalFormat} and pass the
original template + named-argument values through to LambdaLogger.Log
instead of pre-rendering. Mirrors the pattern in
Amazon.Lambda.Logging.AspNetCore.LambdaILogger so the runtime's JSON
formatter can surface {OrderId}-style placeholders as top-level
structured attributes.
BeginScope now maintains an AsyncLocal chain of scope state. On Log,
KVP-shaped scope state is appended to the template as named placeholders
(inner→outer order, inner wins on key collision; explicit message args
win over scope keys). The runtime's JSON formatter promotes the keys
to top-level fields, so durableExecutionArn / operationId / etc. show
up as structured attributes without callers having to swap in a
third-party logger.

Unit tests cover ordering, nested scopes, message-arg precedence,
AsyncLocal isolation, and non-KVP fallback. The integration test now
sets AWS_LAMBDA_LOG_FORMAT=JSON, adds a step-internal log line, and
asserts the scope-derived fields land on the parsed JSON record.
@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/3 branch from 5a29b3e to 209076a Compare May 19, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants