Skip to content

Add RunInChildContextAsync#2370

Draft
GarrettBeatty wants to merge 10 commits into
GarrettBeatty/stack/3from
gcbeatty/durable-child-context
Draft

Add RunInChildContextAsync#2370
GarrettBeatty wants to merge 10 commits into
GarrettBeatty/stack/3from
gcbeatty/durable-child-context

Conversation

@GarrettBeatty
Copy link
Copy Markdown
Contributor

@GarrettBeatty GarrettBeatty commented May 14, 2026

Summary

  • Adds RunInChildContextAsync to IDurableContext with two overloads: returning T and void.
  • Adds ChildContextConfig (SubType for observability, ErrorMapping for exception remapping) and ChildContextException for failure surfacing.
  • Implements Internal/ChildContextOperation<T> mirroring the Step/Wait pattern: sync-flush CONTEXT START -> run user func -> emit SUCCEED with serialized result, or FAIL with error and throw ChildContextException. Replay: SUCCEEDED returns cached value, FAILED throws (after ErrorMapping), STARTED/PENDING re-runs the func and lets the child's own operations replay from their own checkpoints.
  • Result serialization uses the ILambdaSerializer registered on ILambdaContext.Serializer (consistent with StepOperation); no separate AOT overload is needed since the AOT story is owned by the registered serializer (e.g. SourceGeneratorLambdaJsonSerializer<TContext>).
  • Extends LambdaDurableServiceClient.MapFromSdkOperation to copy ContextDetails (Result + Error).

This is a building block for upcoming WaitForCallbackAsync, which (per the Java/JS reference SDKs) wraps CreateCallbackAsync + a submitter step inside a child context for observability and clean error mapping.

#2216

Test plan

  • 14 new tests in ChildContextOperationTests.cs covering: fresh execution + checkpoint emission, deterministic child operation IDs, replay SUCCEEDED, replay FAILED (with and without ErrorMapping), suspended child (Wait inside child propagates termination), replay STARTED with completed inner step, void overload, type-mismatch non-determinism detection, and SubType propagation.
  • 161/161 tests pass on net8.0 and net10.0 (existing Step/Wait tests untouched).
  • Production build clean: 0 warnings, 0 errors.

@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/3 branch from d943d16 to 7ca2099 Compare May 14, 2026 18:07
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch 2 times, most recently from e146869 to 369a029 Compare May 14, 2026 21:49
GarrettBeatty and others added 9 commits May 15, 2026 18:14
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution
SDK: a workflow can run StepAsync and WaitAsync against a real Lambda,
with replay-aware checkpointing wired through to the AWS service.

Public API surface introduced:
- DurableFunction.WrapAsync — entry point that handles the durable
  execution envelope (input hydration, output construction, status mapping)
- IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait)
- StepConfig with serializer hook (retry deferred to follow-up PR)
- ICheckpointSerializer interface
- [DurableExecution] attribute (recognized by future source generator)
- DurableExecutionException base + StepException

Internals:
- DurableExecutionHandler — Task.WhenAny race between user code and
  the suspension signal, returning Succeeded/Failed/Pending
- ExecutionState — replay-aware operation lookup and pending checkpoint
  buffer
- OperationIdGenerator — deterministic, replay-stable IDs
- TerminationManager — TaskCompletionSource-based suspension trigger
- LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and
  state APIs

Tests:
- 86 unit tests covering enums, exceptions, models, configs,
  ID generation, termination, execution state, the handler race,
  the context (Step + Wait paths), and the WrapAsync entry point
- 8 end-to-end integration tests deploying real Lambdas via Docker on
  the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly,
  LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails

Out of scope (follow-up PRs):
- IRetryStrategy, ExponentialRetryStrategy, retry decision factories
- DefaultJsonCheckpointSerializer
- DurableLogger replay-suppression (currently returns NullLogger)
- Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync,
  WaitForConditionAsync — interface intentionally does not declare them
- Annotations source-generator integration
- DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package
- dotnet new lambda.DurableFunction blueprint

stack-info: PR: #2360, branch: GarrettBeatty/stack/2

remove

update

update

update

update
Match the Python / Java / JavaScript reference SDKs' replay-mode model:
the workflow is "replaying" iff it has not yet revisited every
checkpointed completed user-replayable operation. A single global flag
flipped on the first fresh op (the prior model) misclassified workflow-
body code that runs before the first step and would not generalize to
Map/Parallel/Callback later.

ExecutionState changes:
- Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying`
  + `TrackReplay(operationId)`.
- Initial replay decision: any non-EXECUTION op present means we're
  replaying. The service always sends an EXECUTION-type op carrying the
  input payload — that's bookkeeping, not user history, so it does not
  count toward replay (matches Python execution.py:258, Java
  ExecutionManager:81, JS execution-context.ts:62).
- TrackReplay flips IsReplaying false once every checkpointed terminal-
  status non-EXECUTION op has been visited. Terminal set matches
  Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED.

Operation changes:
- DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the
  top, so every operation participates in visit accounting without each
  subclass needing to remember.
- StepOperation/WaitOperation drop their manual EnterExecutionMode calls.

Tests:
- ExecutionStateTests rewritten around IsReplaying/TrackReplay, including
  pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒
  flips out of replay, PENDING ops do not block transition, idempotency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer

DurableExecution now reads the registered ILambdaSerializer from the per-invocation
ILambdaContext (added in the prior PR) for both step-result checkpointing and
workflow input/output. AOT-safety is now determined entirely by which serializer
the user registers with LambdaBootstrapBuilder.Create — there is no longer a
forked path between reflection-based and AOT-safe APIs.

Removed:
- ICheckpointSerializer<T> + SerializationContext record
- ReflectionJsonCheckpointSerializer<T>
- The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync
- The IDurableContext.StepAsync overload that took ICheckpointSerializer<T>
- All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their
  related [UnconditionalSuppressMessage] shims

Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim
attributes in the public API. The AOT smoke test continues to publish with zero
IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
  durable-execution context (which call, which ARN). User logs no longer show
  bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
  inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
  root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
  Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
  netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.

Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
stack-info: PR: #2363, branch: GarrettBeatty/stack/3
@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/3 branch from 7ca2099 to 5a29b3e Compare May 17, 2026 20:16
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch from 369a029 to 5a29b3e Compare May 17, 2026 20:27
Adds child-context support to the .NET Durable Execution SDK. A child
context is a logical sub-workflow with its own deterministic
operation-ID space, persisted as a CONTEXT operation so subsequent
invocations replay the cached value without re-executing the function.

Public surface:
- IDurableContext.RunInChildContextAsync<T> (reflection + AOT-safe
  ICheckpointSerializer<T> overloads, plus a void overload).
- ChildContextConfig with SubType (observability label) and
  ErrorMapping (transform exceptions before they surface to the caller).
- ChildContextException for failure surfacing.

Used as a building block for upcoming WaitForCallbackAsync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a RunInChildContextAsync API to IDurableContext enabling user functions to be executed inside a logical sub-workflow whose result is checkpointed as a CONTEXT operation. The child context shares the parent's state, termination manager, batcher, and Lambda context, but uses a child OperationIdGenerator so its operation IDs are deterministically namespaced. Failures are surfaced via a new ChildContextException, optionally remapped via ChildContextConfig.ErrorMapping. This is positioned as a building block for a future WaitForCallbackAsync.

Changes:

  • New ChildContextOperation<T> mirroring the Step/Wait pattern with replay branches for SUCCEEDED, FAILED, STARTED/PENDING.
  • Public surface additions: IDurableContext.RunInChildContextAsync (typed and void overloads), ChildContextConfig, ChildContextException.
  • Service-client mapping extended to copy ContextDetails (Result, partial Error) from SDK responses, plus 14 new unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs Adds typed/void RunInChildContextAsync to the public interface.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Implements RunInChildContextAsync and constructs ChildContextOperation with a child-context factory.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ChildContextOperation.cs New operation type implementing fresh execution, replay, and failure paths.
Libraries/src/Amazon.Lambda.DurableExecution/ChildContextConfig.cs New config type with SubType and ErrorMapping.
Libraries/src/Amazon.Lambda.DurableExecution/DurableExecutionException.cs Adds ChildContextException with SubType, ErrorType, ErrorData, OriginalStackTrace.
Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs Maps ContextDetails from the SDK operation; drops ErrorData/StackTrace.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/ChildContextOperationTests.cs 14 tests covering fresh, replay, suspension, error-mapping, and non-determinism paths.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/LambdaDurableServiceClientTests.cs New test for ContextDetails Result/Error copy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +132 to 140
ContextDetails = sdkOp.ContextDetails != null ? new Internal.ContextDetails
{
Result = sdkOp.ContextDetails.Result,
Error = sdkOp.ContextDetails.Error != null ? new ErrorObject
{
ErrorType = sdkOp.ContextDetails.Error.ErrorType,
ErrorMessage = sdkOp.ContextDetails.Error.ErrorMessage
} : null
} : null
@GarrettBeatty GarrettBeatty force-pushed the GarrettBeatty/stack/3 branch from 5a29b3e to 209076a Compare May 19, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants