Add replay-aware logger to Amazon.Lambda.DurableExecution#2371
Draft
GarrettBeatty wants to merge 12 commits into
Draft
Add replay-aware logger to Amazon.Lambda.DurableExecution#2371GarrettBeatty wants to merge 12 commits into
GarrettBeatty wants to merge 12 commits into
Conversation
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with serializer hook (retry deferred to follow-up PR) - ICheckpointSerializer interface - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - IRetryStrategy, ExponentialRetryStrategy, retry decision factories - DefaultJsonCheckpointSerializer - DurableLogger replay-suppression (currently returns NullLogger) - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2 remove update update update update
Match the Python / Java / JavaScript reference SDKs' replay-mode model: the workflow is "replaying" iff it has not yet revisited every checkpointed completed user-replayable operation. A single global flag flipped on the first fresh op (the prior model) misclassified workflow- body code that runs before the first step and would not generalize to Map/Parallel/Callback later. ExecutionState changes: - Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying` + `TrackReplay(operationId)`. - Initial replay decision: any non-EXECUTION op present means we're replaying. The service always sends an EXECUTION-type op carrying the input payload — that's bookkeeping, not user history, so it does not count toward replay (matches Python execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62). - TrackReplay flips IsReplaying false once every checkpointed terminal- status non-EXECUTION op has been visited. Terminal set matches Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED. Operation changes: - DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the top, so every operation participates in visit accounting without each subclass needing to remember. - StepOperation/WaitOperation drop their manual EnterExecutionMode calls. Tests: - ExecutionStateTests rewritten around IsReplaying/TrackReplay, including pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒ flips out of replay, PENDING ops do not block transition, idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer DurableExecution now reads the registered ILambdaSerializer from the per-invocation ILambdaContext (added in the prior PR) for both step-result checkpointing and workflow input/output. AOT-safety is now determined entirely by which serializer the user registers with LambdaBootstrapBuilder.Create — there is no longer a forked path between reflection-based and AOT-safe APIs. Removed: - ICheckpointSerializer<T> + SerializationContext record - ReflectionJsonCheckpointSerializer<T> - The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync - The IDurableContext.StepAsync overload that took ICheckpointSerializer<T> - All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their related [UnconditionalSuppressMessage] shims Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim attributes in the public API. The AOT smoke test continues to publish with zero IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
durable-execution context (which call, which ARN). User logs no longer show
bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.
Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
7ca2099 to
5a29b3e
Compare
f74f35a to
0ad914a
Compare
Implement context.Logger, the replay-aware ILogger described in
Docs/durable-execution-design.md and shipped by the Python / Java / JS
reference SDKs. Messages emitted while the workflow is replaying prior
operations are suppressed, so a 30-step workflow re-invoked 30 times
emits each LogInformation line once instead of 30 times.
Public API:
- IDurableContext.Logger — was NullLogger.Instance, now a replay-safe
ILogger backed by Amazon.Lambda.Core.LambdaLogger so logs flow into
the standard runtime pipeline (JSON when AWS_LAMBDA_LOG_FORMAT=JSON,
level-filtered by AWS_LAMBDA_LOG_LEVEL).
- IDurableContext.ConfigureLogger(LoggerConfig) — swap the inner
ILogger (Serilog, Powertools, etc.) and/or disable replay-aware
filtering (ModeAware = false) for debugging. Matches the API shape
documented in the design doc.
Internals:
- ReplayAwareLogger — ILogger decorator that consults
ExecutionState.IsReplaying on every Log call. Short-circuits both
Log<TState> and IsEnabled during replay so LoggerExtensions.LogXxx
doesn't even format the string. BeginScope always passes through so
the scope stack stays balanced.
- LambdaCoreLogger — minimal in-package adapter that delegates to
Amazon.Lambda.Core.LambdaLogger.Log. Avoids forcing a dependency on
Amazon.Lambda.Logging.AspNetCore.
- DurableFunction.WrapAsyncCore opens a BeginScope around the workflow
body carrying durableExecutionArn + awsRequestId. StepOperation
opens a per-step scope (operationId, operationName, attempt) around
the user-func invocation only. Structured log providers (the
runtime's JSON formatter, Serilog, etc.) tag every log line emitted
by user code with that metadata automatically.
Tests:
- ReplayAwareLoggerTests — 7 unit tests: replay suppression, execution
passthrough, ModeAware=false, IsEnabled short-circuit, scope
passthrough, mid-workflow REPLAY→NEW transition (mirrors Python's
test_logger_replay_then_new_logging).
- DurableContextTests — coverage for the default logger, ConfigureLogger
with a custom logger, and ConfigureLogger { ModeAware = false }
enabling logs during replay.
- ReplayAwareLoggerTest (integration) — deploys a Step → Wait → Step
workflow that pairs each context.Logger.LogInformation line with a
Console.WriteLine "control" line. After the durable execution
completes, queries CloudWatch Logs and asserts each replay-aware
line appears exactly once across both invocations while each control
line appears once per invocation, proving the suppression works
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0ad914a to
714d2d6
Compare
There was a problem hiding this comment.
Pull request overview
Adds a replay-aware ILogger implementation to Amazon.Lambda.DurableExecution so workflow logs don’t duplicate during replay, and exposes a small configuration surface for swapping the underlying logger and toggling replay filtering.
Changes:
- Introduces
ReplayAwareLogger+LambdaCoreLogger, wiresIDurableContext.Loggerto be replay-aware by default, and addsIDurableContext.ConfigureLogger(LoggerConfig). - Adds execution- and step-level
BeginScopemetadata for structured loggers. - Adds unit tests and a CloudWatch-based integration test to validate replay suppression end-to-end.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs | Updates public context API with replay-safe Logger docs and new ConfigureLogger(LoggerConfig) method. |
| Libraries/src/Amazon.Lambda.DurableExecution/LoggerConfig.cs | Adds public configuration type for swapping inner logger and toggling replay-aware suppression. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/ReplayAwareLogger.cs | Adds replay-suppressing ILogger decorator driven by ExecutionState.IsReplaying. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/LambdaCoreLogger.cs | Adds default in-package logger adapter that routes to Amazon.Lambda.Core.LambdaLogger. |
| Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs | Defaults Logger to replay-aware logger and implements ConfigureLogger. |
| Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs | Adds execution-level logging scope for structured metadata. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/StepOperation.cs | Adds step-level logging scope for operation metadata around user step invocation. |
| Libraries/test/Amazon.Lambda.DurableExecution.Tests/Internal/ReplayAwareLoggerTests.cs | Adds unit tests for replay suppression, scope passthrough, and mode transitions. |
| Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableContextTests.cs | Adds unit tests for default logger type and ConfigureLogger behavior. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/Amazon.Lambda.DurableExecution.IntegrationTests.csproj | Adds CloudWatch Logs SDK dependency for log verification. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ReplayAwareLoggerTest.cs | Adds CloudWatch-based integration test validating replay suppression vs Console control lines. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/ReplayAwareLoggerFunction.csproj | Adds new integration-test Lambda function project. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/Function.cs | Implements Step→Wait→Step workflow emitting replay-aware and control log markers. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ReplayAwareLoggerFunction/Dockerfile | Adds container packaging for the new integration-test function. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // expected count of every marker so the test never short-circuits with | ||
| // a still-arriving "after_step1" record (which is emitted at a | ||
| // different timestamp than workflow_start and indexes independently). | ||
| using var logs = new AmazonCloudWatchLogsClient(RegionEndpoint.USEast1); |
Comment on lines
+32
to
+33
| // Level filtering is performed by the runtime layer (AWS_LAMBDA_LOG_LEVEL). | ||
| public bool IsEnabled(LogLevel logLevel) => logLevel != LogLevel.None; |
Comment on lines
+44
to
+54
| var message = formatter(state, exception); | ||
| var levelName = logLevel.ToString(); | ||
|
|
||
| if (exception != null) | ||
| { | ||
| CoreLambdaLogger.Log(levelName, exception, message); | ||
| } | ||
| else | ||
| { | ||
| CoreLambdaLogger.Log(levelName, message); | ||
| } |
Comment on lines
+117
to
+121
| using (context.Logger.BeginScope(new Dictionary<string, object> | ||
| { | ||
| ["durableExecutionArn"] = invocationInput.DurableExecutionArn, | ||
| ["awsRequestId"] = lambdaContext.AwsRequestId ?? string.Empty, | ||
| })) |
Comment on lines
+203
to
+207
| using (_logger.BeginScope(new Dictionary<string, object> | ||
| { | ||
| ["operationId"] = OperationId, | ||
| ["operationName"] = Name ?? string.Empty, | ||
| ["attempt"] = attemptNumber, |
When state is FormattedLogValues, extract {OriginalFormat} and pass the
original template + named-argument values through to LambdaLogger.Log
instead of pre-rendering. Mirrors the pattern in
Amazon.Lambda.Logging.AspNetCore.LambdaILogger so the runtime's JSON
formatter can surface {OrderId}-style placeholders as top-level
structured attributes.
BeginScope now maintains an AsyncLocal chain of scope state. On Log, KVP-shaped scope state is appended to the template as named placeholders (inner→outer order, inner wins on key collision; explicit message args win over scope keys). The runtime's JSON formatter promotes the keys to top-level fields, so durableExecutionArn / operationId / etc. show up as structured attributes without callers having to swap in a third-party logger. Unit tests cover ordering, nested scopes, message-arg precedence, AsyncLocal isolation, and non-KVP fallback. The integration test now sets AWS_LAMBDA_LOG_FORMAT=JSON, adds a step-internal log line, and asserts the scope-derived fields land on the parsed JSON record.
5a29b3e to
209076a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2216
What
Implements
context.Logger, the replay-awareILoggerdescribed inDocs/durable-execution-design.mdand shipped by the Python / Java / JavaScript reference SDKs.Public API surface introduced:
IDurableContext.LoggerILogger(wasNullLogger.Instance).IDurableContext.ConfigureLogger(LoggerConfig)LoggerConfigCustomLogger+ModeAwareconfiguration record.Why
Without replay-aware logging, every
Console.WriteLine(or any non-suppressing logger) repeats on every replay invocation. A 30-step workflow re-invoked 30 times produces 30 copies of every log line — noisy at best, misleading at worst. The reference SDKs all solve this by reading replay state on each log call and suppressing emission while the workflow is re-deriving prior operations from checkpoint state. This PR ports that behavior to .NET on top of the per-operation replay tracker introduced in #2360.How
ReplayAwareLogger. An
ILoggerdecorator that consultsExecutionState.IsReplayingon every call. Short-circuits bothLog<TState>andIsEnabledduring replay soLoggerExtensions.LogXxxdoesn't even format the message string.BeginScopealways passes through so the scope stack stays balanced — suppression only applies at log emission.Default inner logger.
LambdaCoreLogger— a minimal in-package adapter that delegates toAmazon.Lambda.Core.LambdaLogger.Log, so logs flow into the standard Lambda runtime pipeline (JSON whenAWS_LAMBDA_LOG_FORMAT=JSON, level-filtered byAWS_LAMBDA_LOG_LEVEL). Two structured-logging behaviors:stateis theFormattedLogValuesproduced byLoggerExtensions.LogXxx, the original template and named-argument values are forwarded so the runtime's JSON formatter surfaces{OrderId}-style placeholders as top-level structured attributes.BeginScopemaintains anAsyncLocalchain of scope state. KVP-shaped scope state is appended to the outgoing template as named placeholders (inner→outer order, inner wins on key collision; explicit message args win over scope keys), sodurableExecutionArn/operationId/ etc. show up as top-level JSON fields without callers having to swap in a third-party logger.Mirrors the structured-logging pattern in
Amazon.Lambda.Logging.AspNetCore.LambdaILogger. Avoids forcing a dependency onAmazon.Lambda.Logging.AspNetCore. Users who want Serilog/Powertools/etc. swap their own logger viaConfigureLogger.Metadata scopes.
DurableFunction.WrapAsyncCoreopens aBeginScopearound the workflow body carryingdurableExecutionArn+awsRequestId.StepOperationopens a per-step scope (operationId,operationName,attempt) around the user-func invocation only. Combined with the scope-aware default logger above, every log line emitted by user code is automatically tagged with execution / step metadata.Key files:
LoggerConfig.cs— public configuration typeInternal/ReplayAwareLogger.cs— the replay-aware decoratorInternal/LambdaCoreLogger.cs— default inner logger; preserves structured args + flattens scope chainDurableContext.cs— replacesNullLoggerdefault; implementsConfigureLoggerDurableFunction.cs— execution-level scopeInternal/StepOperation.cs— step-level scope around user funcTesting
Unit tests (21 new in
Amazon.Lambda.DurableExecution.Tests):ReplayAwareLoggerTests(7) — replay suppression, execution passthrough,ModeAware=false,IsEnabledshort-circuit,BeginScopepassthrough, mid-workflow REPLAY → NEW transition (mirrors Python'stest_logger_replay_then_new_logging).DurableContextTests(3) —Logger_Default_IsReplayAwareLogger,ConfigureLogger_WithCustomLogger_ReachesUserLogger,ConfigureLogger_ModeAwareFalse_LogsDuringReplay.LambdaCoreLoggerTests(11) — installs capture delegates intoLambdaLogger._loggingWithLevelAction(the same hook RuntimeSupport uses) and asserts named placeholders + arg values are forwarded intact, exception variant works, plain messages pass through as literals, non-FormattedLogValuesstate falls back toformatter(state, exception), KVP scopes are appended, nested scopes flatten inner→outer with inner winning on collision, explicit message args win over scope keys, scope is popped on dispose,AsyncLocalisolates concurrent tasks, and non-KVP scopes are ignored.Integration test (
ReplayAwareLoggerTestinAmazon.Lambda.DurableExecution.IntegrationTests):End-to-end proof on real AWS infra. Deploys a
step → wait(3s) → stepworkflow that pairs eachcontext.Logger.LogInformationline with aConsole.WriteLine"control" line; the test function runs withAWS_LAMBDA_LOG_FORMAT=JSON. After the durable execution completes (across two invocations driven by the wait), queries CloudWatch Logs and asserts:durableExecutionArn+awsRequestIdon workflow-level lines; additionallyoperationId+operationName+attempton lines emitted inside a step delegate.This pins both the replay-suppression contract and the structured-scope contract end-to-end against the actual durable-execution service.
Out of scope (follow-up PRs)
MapAsync/ParallelAsync/RunInChildContextAsync/WaitForConditionAsyncCallbackAsync,InvokeAsyncDefaultJsonCheckpointSerializer[DurableExecution]attributeDurableTestRunner/Amazon.Lambda.DurableExecution.Testingpackagedotnet new lambda.DurableFunctionblueprint