Skip to content

Expand SDK E2E runtime coverage#1197

Merged
stephentoub merged 16 commits intomainfrom
stephentoub/e2e-test-gap-analysis
May 4, 2026
Merged

Expand SDK E2E runtime coverage#1197
stephentoub merged 16 commits intomainfrom
stephentoub/e2e-test-gap-analysis

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

This expands E2E coverage so SDK regressions caused by runtime changes are caught consistently across languages. It starts from the C# coverage expansion and ports the exposed scenarios to TypeScript, Go, and Python where the public SDK surfaces exist.

Summary

  • Add and extend E2E coverage for abort behavior, permissions, pending-work resume, multi-client event broadcast, session state, event fidelity, streaming, tool results, and tools.
  • Add shared replay snapshots and a test MCP elicitation server for deterministic cross-SDK coverage.
  • Harden tests by replacing opportunistic sleeps and blocking handlers with explicit event or idle synchronization, bounded waits, and stronger cleanup behavior.

Notes

  • Exit-plan and auto-mode responder flows remain out of scope where the required SDK APIs are not publicly exposed.
  • Python full-suite runtime increases with 42 additional collected tests; the completed five-run matrix averaged 756.6s for Python.

Validation

  • Full five-run matrix passed across Go, Python, Node.js, .NET, and corrections.
  • git diff --check passed.

Copilot AI review requested due to automatic review settings May 4, 2026 18:45
@stephentoub stephentoub requested a review from a team as a code owner May 4, 2026 18:45
Comment thread python/e2e/test_permissions_e2e.py Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/E2E/PermissionE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the cross-SDK end-to-end (E2E) test matrix (Node/TS, Python, Go, .NET) to catch runtime regressions consistently across languages, primarily by adding new scenarios plus shared replay snapshots and hardening test synchronization/cleanup.

Changes:

  • Added many new cross-language E2E scenarios (abort, permissions, pending-work resume, event fidelity/ordering, streaming fidelity, lifecycle events, tool results, background tasks).
  • Added/updated shared replay snapshots to make these scenarios deterministic across SDKs.
  • Hardened .NET test harness reliability (condition polling helpers, client tracking/cleanup, process-tree cleanup).
Show a summary per file
File Description
test/snapshots/tools/should_respect_availabletools_and_excludedtools_combined.yaml New replay snapshot for available/excluded tools precedence.
test/snapshots/tools/should_execute_multiple_custom_tools_in_parallel_single_turn.yaml New replay snapshot for parallel custom tool calls in one turn.
test/snapshots/tool_results/should_handle_tool_result_with_rejected_resulttype.yaml New replay snapshot for rejected tool resultType.
test/snapshots/tool_results/should_handle_tool_result_with_denied_resulttype.yaml New replay snapshot for denied tool resultType.
test/snapshots/streaming_fidelity/should_not_produce_deltas_after_session_resume_with_streaming_disabled.yaml New replay snapshot for resume with streaming disabled.
test/snapshots/streaming_fidelity/should_emit_streaming_deltas_with_reasoning_effort_configured.yaml New replay snapshot for streaming + reasoning_effort.
test/snapshots/session/should_send_with_github_reference_attachment.yaml Snapshot adjusted to avoid tool calls and rely on attachment metadata.
test/snapshots/session/should_log_messages_at_various_levels.yaml New replay snapshot for logging behavior.
test/snapshots/session_lifecycle/should_isolate_events_between_concurrent_sessions.yaml New replay snapshot for concurrent session isolation.
test/snapshots/rpc_tasks_and_handlers/should_start_background_agent_and_report_task_details.yaml New replay snapshot for background task agent flow.
test/snapshots/rpc_session_state/should_report_error_when_forking_session_to_unknown_event_id.yaml New replay snapshot for session fork error path.
test/snapshots/rpc_session_state/should_fork_session_to_event_id_excluding_boundary_event.yaml New replay snapshot for boundary event fork behavior.
test/snapshots/rpc_event_side_effects/should_emit_snapshot_rewind_event_and_remove_events_on_truncate.yaml New replay snapshot for truncate side effects.
test/snapshots/rpc_event_side_effects/should_allow_session_use_after_truncate.yaml New replay snapshot ensuring session still usable after truncate.
test/snapshots/permissions/should_wait_for_slow_permission_handler.yaml New replay snapshot for slow permission handler gating.
test/snapshots/permissions/should_short_circuit_permission_handler_when_set_approve_all_enabled.yaml New replay snapshot for server-side approve-all short-circuit.
test/snapshots/permissions/should_handle_concurrent_permission_requests_from_parallel_tools.yaml New replay snapshot for concurrent permission prompts.
test/snapshots/permissions/should_deny_permission_with_noresult_kind.yaml New replay snapshot for “no-result” permission behavior.
test/snapshots/pending_work_resume/should_report_continuependingwork_true_in_resume_event.yaml New replay snapshot for resume event continuePendingWork=true.
test/snapshots/pending_work_resume/should_keep_pending_external_tool_handleable_on_warm_resume_when_continuependingwork_is_false.yaml New replay snapshot for warm resume with continuePendingWork=false.
test/snapshots/multi_client/one_client_approves_permission_and_both_see_the_result.yaml Snapshot text tweak for multi-client permission approval case.
test/snapshots/mcp_and_agents/should_round_trip_mcp_server_elicitation_request.yaml New replay snapshot for MCP elicitation server tool round-trip.
test/snapshots/event_fidelity/should_preserve_message_order_in_getmessages_after_tool_use.yaml New replay snapshot for getMessages ordering after tool use.
test/snapshots/event_fidelity/should_emit_session_usage_info_event_after_model_call.yaml New replay snapshot for session.usage_info.
test/snapshots/event_fidelity/should_emit_pending_messages_modified_event_when_message_queue_changes.yaml New replay snapshot for pending_messages.modified.
test/snapshots/event_fidelity/should_emit_assistant_usage_event_after_model_call.yaml New replay snapshot for assistant.usage.
test/snapshots/client_lifecycle/should_receive_session_deleted_lifecycle_event_when_deleted.yaml New replay snapshot for session.deleted lifecycle.
test/snapshots/abort/should_abort_during_active_tool_execution.yaml New replay snapshot for abort during tool execution.
test/snapshots/abort/should_abort_during_active_streaming.yaml New replay snapshot for abort during streaming.
test/harness/test-mcp-elicitation-server.mjs New deterministic MCP elicitation stdio server for E2E.
python/e2e/test_tools_e2e.py Adds Python tool E2Es (parallel tools; available vs excluded tools).
python/e2e/test_tool_results_e2e.py Adds Python E2Es for denied/rejected tool resultType surfaces.
python/e2e/test_streaming_fidelity_e2e.py Adds Python resume-with-streaming-disabled + reasoning_effort streaming coverage.
python/e2e/test_rpc_tasks_and_handlers_e2e.py Adds Python RPC task-agent coverage and additional permission decision cases.
python/e2e/test_pending_work_resume_e2e.py Adds Python pending-work resume semantics tests (continuePendingWork variants).
python/e2e/test_multi_turn_e2e.py Adds Python per-turn event ordering assertions for tool turns.
python/e2e/test_multi_client_e2e.py Avoids hanging permission handlers by using explicit “no-result” behavior.
python/e2e/test_event_fidelity_e2e.py Adds Python usage + pending queue + get_messages ordering tests.
python/e2e/test_client_lifecycle_e2e.py Replaces sleep-based waits with polling for persistence/lifecycle events.
python/e2e/test_abort_e2e.py New Python abort E2Es (abort during streaming + tool execution).
nodejs/test/e2e/tools.e2e.test.ts Adds TS tool E2Es (parallel tools; excluded tools filtering).
nodejs/test/e2e/tool_results.e2e.test.ts Adds TS denied/rejected ToolResultObject behavior + timeout helper.
nodejs/test/e2e/streaming_fidelity.e2e.test.ts Adds TS resume-with-streaming-disabled + reasoningEffort streaming coverage.
nodejs/test/e2e/session.e2e.test.ts Updates prompt to ensure GitHub reference summarization uses attachment metadata only.
nodejs/test/e2e/session_lifecycle.e2e.test.ts Adds TS concurrent session event isolation test.
nodejs/test/e2e/permissions.e2e.test.ts Adds TS permission flow hardening (slow handler; concurrent permission requests; no-result; approve-all).
nodejs/test/e2e/pending_work_resume.e2e.test.ts Adds TS pending-work resume semantics tests (continuePendingWork variants).
nodejs/test/e2e/multi_turn.e2e.test.ts Adds TS tool-turn event ordering assertions with per-turn snapshots.
nodejs/test/e2e/event_fidelity.e2e.test.ts Adds TS usage + pending queue + getMessages ordering fidelity checks.
nodejs/test/e2e/client_lifecycle.e2e.test.ts Adds TS client lifecycle tests for session.updated and session.deleted.
nodejs/test/e2e/abort.e2e.test.ts New TS abort E2Es (abort during streaming + tool execution).
go/internal/e2e/tools_e2e_test.go Adds Go tool E2Es (parallel tools; available/excluded tools precedence).
go/internal/e2e/tool_results_e2e_test.go Adds Go denied/rejected tool resultType coverage.
go/internal/e2e/streaming_fidelity_e2e_test.go Adds Go resume-with-streaming-disabled + reasoning effort streaming coverage.
go/internal/e2e/session_e2e_test.go Updates prompt to summarize GitHub references using metadata only.
go/internal/e2e/pending_work_resume_e2e_test.go Adds Go pending-work resume semantics tests (continuePendingWork variants).
go/internal/e2e/multi_turn_e2e_test.go Adds Go tool-turn event ordering assertions and helpers.
go/internal/e2e/multi_client_e2e_test.go Avoids hanging permission handlers by returning explicit no-result.
go/internal/e2e/abort_e2e_test.go New Go abort E2Es (abort during streaming + tool execution).
dotnet/test/Harness/TestHelper.cs Adds predicate-aware event waits + reusable polling/condition helper.
dotnet/test/Harness/E2ETestFixture.cs Uses persistent shared client via E2ETestContext tracking.
dotnet/test/Harness/E2ETestContext.cs Tracks persistent vs transient clients; robust teardown (stop clients, stop proxy, delete dirs w/ retries).
dotnet/test/Harness/E2ETestBase.cs Ensures per-test cleanup runs (stops transient clients).
dotnet/test/Harness/CapiProxy.cs Ensures proxy process-tree cleanup and disposes process.
dotnet/test/E2E/ToolsE2ETests.cs Adds .NET tool E2Es (parallel tools; available/excluded tools precedence).
dotnet/test/E2E/ToolResultsE2ETests.cs Adds .NET denied/rejected tool resultType coverage.
dotnet/test/E2E/TelemetryExportE2ETests.cs Replaces sleep loops with robust file polling.
dotnet/test/E2E/StreamingFidelityE2ETests.cs Adds resume-with-streaming-disabled + reasoning effort streaming coverage.
dotnet/test/E2E/SessionMcpAndAgentConfigE2ETests.cs Adds MCP elicitation round-trip test + MCP server status polling helper.
dotnet/test/E2E/SessionLifecycleE2ETests.cs Adds concurrent session isolation test; replaces local polling with shared helper.
dotnet/test/E2E/SessionFsE2ETests.cs Uses shared wait helpers; improves teardown tracking.
dotnet/test/E2E/SessionE2ETests.cs Removes sleep-based synchronization; adds thread-safe event collection; updates GitHub reference prompt.
dotnet/test/E2E/SessionConfigE2ETests.cs Adds reasoningEffort propagation checks (create + resume).
dotnet/test/E2E/RpcTasksAndHandlersE2ETests.cs Adds task agent coverage + missing-permission decision variants; uses shared polling.
dotnet/test/E2E/RpcShellAndFleetE2ETests.cs Replaces bespoke polling with TestHelper.WaitForConditionAsync.
dotnet/test/E2E/RpcMcpAndSkillsE2ETests.cs Adds MCP OAuth error-path assertions.
dotnet/test/E2E/RpcEventSideEffectsE2ETests.cs New .NET RPC side-effects test suite (mode/plan/workspace/name/truncate events).
dotnet/test/E2E/RpcAgentE2ETests.cs Adds subagent selected/deselected event assertions; aligns reload expectations.
dotnet/test/E2E/PendingWorkResumeE2ETests.cs Adds pending-work resume semantics tests (continuePendingWork variants).
dotnet/test/E2E/MultiTurnE2ETests.cs Adds tool-turn event ordering assertions with thread-safe event snapshots.
dotnet/test/E2E/MultiClientE2ETests.cs Uses shared test-name helper; improves per-test cleanup for multi-client fixtures.
dotnet/test/E2E/MultiClientCommandsElicitationE2ETests.cs Similar fixture cleanup improvements; avoids reflection-based test-name lookup.
dotnet/test/E2E/HookLifecycleAndOutputE2ETests.cs Removes sleep-based hook wait; waits on explicit completion signal.
dotnet/test/E2E/EventFidelityE2ETests.cs Adds usage + pending queue + getMessages ordering fidelity tests.
dotnet/test/E2E/ElicitationE2ETests.cs Consolidates capability tests into a parameterized theory.
dotnet/test/E2E/CompactionE2ETests.cs Re-enables/stabilizes compaction tests using explicit event waits and stronger assertions.
dotnet/test/E2E/ClientOptionsE2ETests.cs Adds Activity tracecontext propagation tests; refactors fake CLI capture helpers.
dotnet/test/E2E/ClientLifecycleE2ETests.cs Adds session.updated + session.deleted lifecycle tests; improves resource management patterns.
dotnet/test/E2E/AbortE2ETests.cs New .NET abort E2Es (abort during streaming + tool execution).
dotnet/src/Client.cs Improves process cleanup (kill tree + wait) and surfaces stderr on unexpected CLI exit; makes stderr capture awaitable.

Copilot's findings

  • Files reviewed: 103/103 changed files
  • Comments generated: 3

Comment thread nodejs/test/e2e/tools.e2e.test.ts
Comment thread python/e2e/test_streaming_fidelity_e2e.py Outdated
Comment thread dotnet/test/E2E/ClientLifecycleE2ETests.cs
@stephentoub stephentoub force-pushed the stephentoub/e2e-test-gap-analysis branch from 9765acc to 0b7b275 Compare May 4, 2026 18:53
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcExtensionsLoadedE2ETests.cs Fixed
Comment thread dotnet/test/E2E/SessionMcpAndAgentConfigE2ETests.cs Fixed
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1197 · ● 992.9K

Comment thread dotnet/src/Client.cs
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1197 · ● 1.5M

Comment thread dotnet/src/Client.cs
Comment thread dotnet/src/Client.cs
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1197 · ● 930.8K

Comment thread dotnet/src/Client.cs
stephentoub and others added 11 commits May 4, 2026 17:06
Add runtime-driven C# E2E coverage for exposed SDK behavior, harden deterministic synchronization and cleanup, and strengthen assertions/snapshots across the suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts:
#	dotnet/test/E2E/ClientOptionsE2ETests.cs
#	dotnet/test/E2E/MultiClientCommandsElicitationE2ETests.cs
#	dotnet/test/E2E/MultiClientE2ETests.cs
#	dotnet/test/Harness/E2ETestContext.cs
Add TypeScript, Go, and Python E2E coverage for the runtime scenarios covered by the C# expansion, including abort, event fidelity, permissions, pending-work resume, session state, streaming, tool-result, and multi-client edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts:
#	nodejs/test/e2e/multi-client.e2e.test.ts
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
stephentoub and others added 3 commits May 4, 2026 17:06
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub force-pushed the stephentoub/e2e-test-gap-analysis branch from 1d28ec9 to 549be42 Compare May 4, 2026 21:06
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #1197 · ● 1.2M

Comment thread dotnet/src/Client.cs
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub
Copy link
Copy Markdown
Collaborator Author

Generated by Copilot:

1. rpc_event_side_effects tests are missing from Go and Node.js

2. New fork_session_to_event_id scenarios are missing from Go and Node.js

Addressed in e502b13. I added Node and Go rpc_event_side_effects E2E suites and added the two sessions.fork toEventId scenarios to the existing Node and Go session-state suites, using the existing shared snapshots where applicable.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Cross-SDK Consistency Review ✅

Reviewed all four SDK source changes in this PR (dotnet/src/Client.cs, go/client.go, nodejs/src/client.ts, python/copilot/client.py) for cross-language consistency.

SDK Source Changes

connect method legacy fallback fix — all 4 SDKs ✅

Each SDK now additionally checks for the "Unhandled method connect" error message alongside the MethodNotFound error code. The fix is applied consistently in all four languages with idiomatic error-handling patterns:

SDK Pattern
.NET IsUnsupportedConnectMethod(remoteEx) helper checking ErrorCode == MethodNotFoundErrorCode || string.Equals(ex.Message, "Unhandled method connect")
Go rpcErr.Code == jsonrpc2.ErrMethodNotFound.Code || rpcErr.Message == "Unhandled method connect"
Node.js err.code === ErrorCodes.MethodNotFound || err.message === "Unhandled method connect"
Python err.code == -32601 or err.message == "Unhandled method connect"

Additional .NET-only process management improvements — implementation-internal ✅

Several .NET Client.cs improvements (killing the full process tree on cleanup, improved stderr reader loop, stderrReader task awaiting before building the error message) are internal to .NET's System.Diagnostics.Process API and don't have a direct cross-SDK equivalent. Go, Node.js, and Python each use different process management paradigms (event-based I/O in Node.js, subprocess with SIGTERM/SIGKILL fallback in Python, exec.Cmd in Go) that already handle their respective concerns appropriately.

E2E Test Coverage Expansion

The new tests follow a consistent cross-SDK pattern. The scenarios ported to Node.js, Python, and Go match the stated scope of the PR ("where the public SDK surfaces exist"). The .NET-only test files (RpcExtensionsLoadedE2ETests, RpcShellAndFleetE2ETests, ElicitationE2ETests, etc.) cover APIs not yet publicly exposed in other SDKs, which is explicitly called out in the PR description.

No cross-SDK consistency gaps found.

Generated by SDK Consistency Review Agent for issue #1197 · ● 823.9K ·

@stephentoub stephentoub merged commit c063458 into main May 4, 2026
35 checks passed
@stephentoub stephentoub deleted the stephentoub/e2e-test-gap-analysis branch May 4, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants