diff --git a/README.md b/README.md index 7f143161..8306dad4 100644 --- a/README.md +++ b/README.md @@ -273,10 +273,13 @@ provider-backed ELF evidence was required. personalization, local `get_all` export-style readback, and deletion audit history. The separate OpenMemory export-helper setup probe in `live-baseline-20260611122416` records `blocked` with `DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER`, so SDK `get_all` - is still not UI/export evidence. The comparison records ELF as a loss on preference - correction history, ties on scoped personalization and delete audit, `not_tested` - for local SDK export-style parity, `blocked` for OpenMemory UI/export, and - `non_goal` for hosted Platform export and optional graph memory in the local OSS + is still not UI/export evidence. OpenMemory UI/export product recheck after XY-987 + refreshed that blocker in `live-baseline-20260619065543`; product browser/dashboard + readback is still not reached because the export helper needs Docker access to a + running OpenMemory product container. The comparison records ELF as a loss on + preference correction history, ties on scoped personalization and delete audit, + `not_tested` for local SDK export-style parity, `blocked` for OpenMemory UI/export, + and `non_goal` for hosted Platform export and optional graph memory in the local OSS lane. - Capture/write-policy live follow-up after XY-933: ELF now passes 4/4 live `capture_integration` jobs with zero redaction leaks, source ids preserved in @@ -318,6 +321,7 @@ Detailed evidence and interpretation: - [qmd Debug-Ergonomics Dreaming Retest Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md) - [OpenViking Trajectory Materialization Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openviking-trajectory-materialization-report.md) - [Service-Native Dreaming Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-service-native-dreaming-readback-report.md) +- [OpenMemory UI/Export Product Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Single-User Production Runbook](docs/runbook/single_user_production.md) - Benchmark contract: @@ -403,6 +407,7 @@ Detailed comparison, mechanism-level analysis, and source map: - [Scheduled Memory Task Scoring Report - June 16, 2026](docs/evidence/benchmarking/2026-06-16-scheduled-memory-task-scoring-report.md) - [Dreaming Competitor-Strength Retest Report - June 17, 2026](docs/evidence/benchmarking/2026-06-17-dreaming-competitor-strength-retest-report.md) - [qmd Debug-Ergonomics Dreaming Retest Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md) +- [OpenMemory UI/Export Product Readback Report - June 19, 2026](docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Real-World Agent Memory Benchmark](docs/runbook/benchmarking/real_world_agent_memory_benchmark.md) - [External Memory Improvement Plan](docs/evidence/external_memory/external_memory_improvement_plan.md) diff --git a/apps/elf-eval/fixtures/report_snapshots/2026-06-19-openmemory-ui-export-product-readback-report.json b/apps/elf-eval/fixtures/report_snapshots/2026-06-19-openmemory-ui-export-product-readback-report.json new file mode 100644 index 00000000..37bed045 --- /dev/null +++ b/apps/elf-eval/fixtures/report_snapshots/2026-06-19-openmemory-ui-export-product-readback-report.json @@ -0,0 +1,98 @@ +{ + "schema": "elf.openmemory_ui_export_product_recheck_report/v1", + "report_id": "xy-987-openmemory-ui-export-product-readback-2026-06-19", + "authority": "XY-987", + "created_at": "2026-06-19T06:56:58Z", + "goal": "Recheck OpenMemory UI/export readback with a product-level local runner or publish a fresh typed setup blocker with concrete evidence.", + "command": { + "command": "cargo make openmemory-ui-export-readback", + "status": "pass", + "runtime_seconds": 78.02, + "report_artifact": "tmp/live-baseline/live-baseline-report.json", + "probe_artifact": "tmp/live-baseline/mem0-openmemory-ui-export.json", + "attempt_log": "tmp/live-baseline/mem0-openmemory-export-attempt.log" + }, + "source_baseline": { + "previous_report": "docs/evidence/benchmarking/2026-06-11-mem0-openmemory-history-ui-export-report.md", + "previous_snapshot": "apps/elf-eval/fixtures/report_snapshots/2026-06-11-xy-931-openmemory-ui-export-readback.json", + "previous_status": "blocked", + "previous_reason_code": "DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER" + }, + "run": { + "run_id": "live-baseline-20260619065543", + "project_filter": "mem0", + "sdk_baseline_status": "pass", + "sdk_check_summary": { + "total": 8, + "pass": 8, + "fail": 0, + "wrong_result": 0, + "blocked": 0 + }, + "ui_export_status": "blocked", + "ui_export_reason_code": "DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER" + }, + "same_corpus_boundary": { + "sdk_result_artifact": "tmp/live-baseline/mem0-search.json", + "sdk_get_all_check_status": "pass", + "sdk_get_all_is_ui_export_evidence": false, + "openmemory_ui_export_is_separate_product_ux_scenario": true + }, + "openmemory_product_surface": { + "tree_present": true, + "ui_package_present": true, + "compose_file_present": true, + "export_script_present": true, + "sunsetting_notice_present": true, + "requires_openai_api_key": true, + "requires_docker_compose": true, + "export_requires_running_container": true, + "default_export_container": "openmemory-openmemory-mcp-1" + }, + "openmemory_probe": { + "attempt": { + "command": "timeout 30 bash openmemory/backup-scripts/export_openmemory.sh --user-id elf-history-user --container openmemory-openmemory-mcp-1", + "exit_code": 1, + "log_artifact": "tmp/live-baseline/mem0-openmemory-export-attempt.log", + "output_excerpt": "openmemory/backup-scripts/export_openmemory.sh: line 52: docker: command not found\nERROR: Container 'openmemory-openmemory-mcp-1' not found/running. Pass --container if different." + }, + "export_validation": {} + }, + "classification": { + "status": "blocked", + "comparison_judgment": "unchanged", + "reason_code": "DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER", + "reason": "The OpenMemory export helper requires Docker access to a running OpenMemory product container, but Docker is not available inside the baseline-runner container; browser/dashboard readback is not reached.", + "next_action": "Add a dedicated OpenMemory Docker Compose profile that imports the generated mem0 corpus into the OpenMemory app database, starts the API/UI with explicit local or provider configuration, then rerun the export helper and validate exported memories." + }, + "improvement_regression_readback": { + "judgment": "unchanged", + "improved": [ + "The OpenMemory UI/export blocker has a fresh June 19 command run, JSON artifact, and attempt log." + ], + "unchanged": [ + "mem0 local OSS SDK history and get_all readback remain pass-only SDK evidence.", + "OpenMemory product UI/export readback remains blocked before same-corpus product app database validation.", + "No ELF win, tie, or loss is allowed for OpenMemory UI/export." + ], + "regressed": [] + }, + "claim_boundary": { + "elf_can_compare_against_openmemory_ui_export_after_this_run": false, + "hosted_platform_claim": false, + "optional_graph_memory_enabled": false, + "sdk_get_all_is_ui_export_evidence": false, + "product_browser_or_dashboard_readback_reached": false + }, + "next_optimization_direction": { + "required_fields": [ + "dedicated_openmemory_compose_profile", + "same_corpus_import_into_openmemory_app_database", + "openmemory_api_or_ui_readback_artifact", + "export_zip_validation_against_elf_history_user", + "explicit_provider_or_local_model_configuration", + "separate_sdk_get_all_and_product_export_scorers" + ], + "non_goal": "Do not use hosted mem0 Platform export or private operator data in this local OSS lane." + } +} diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs index 02ebec13..b30b4cc9 100644 --- a/apps/elf-eval/tests/real_world_job_benchmark.rs +++ b/apps/elf-eval/tests/real_world_job_benchmark.rs @@ -246,6 +246,10 @@ fn service_native_dreaming_readback_materialization_json_path() -> Result Result { + report_snapshot_path("2026-06-19-openmemory-ui-export-product-readback-report.json") +} + fn openviking_trajectory_materialization_report_markdown_path() -> Result { Ok(workspace_root()? .join("docs") @@ -270,6 +274,14 @@ fn service_native_dreaming_readback_report_markdown_path() -> Result { .join("2026-06-19-service-native-dreaming-readback-report.md")) } +fn openmemory_ui_export_product_readback_report_markdown_path() -> Result { + Ok(workspace_root()? + .join("docs") + .join("evidence") + .join("benchmarking") + .join("2026-06-19-openmemory-ui-export-product-readback-report.md")) +} + fn live_temporal_reconciliation_report_json_path() -> Result { report_snapshot_path("2026-06-16-live-temporal-reconciliation-report.json") } @@ -3413,6 +3425,86 @@ fn assert_service_native_dreaming_docs(markdown: &str, benchmarking_index: &str, assert!(readme.contains("real-world-memory-service-native-dreaming")); } +#[test] +fn openmemory_ui_export_product_recheck_preserves_blocked_boundary() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + openmemory_ui_export_product_readback_report_json_path()?, + )?)?; + let markdown = + fs::read_to_string(openmemory_ui_export_product_readback_report_markdown_path()?)?; + let benchmarking_index = fs::read_to_string(benchmarking_index_path()?)?; + let readme = fs::read_to_string(readme_path()?)?; + + assert_eq!( + report.pointer("/schema").and_then(Value::as_str), + Some("elf.openmemory_ui_export_product_recheck_report/v1") + ); + assert_eq!(report.pointer("/authority").and_then(Value::as_str), Some("XY-987")); + assert_eq!( + report.pointer("/command/command").and_then(Value::as_str), + Some("cargo make openmemory-ui-export-readback") + ); + assert_eq!(report.pointer("/command/status").and_then(Value::as_str), Some("pass")); + assert_eq!( + report.pointer("/command/probe_artifact").and_then(Value::as_str), + Some("tmp/live-baseline/mem0-openmemory-ui-export.json") + ); + assert_eq!(report.pointer("/run/sdk_check_summary/pass").and_then(Value::as_u64), Some(8)); + assert_eq!(report.pointer("/run/ui_export_status").and_then(Value::as_str), Some("blocked")); + assert_eq!( + report.pointer("/run/ui_export_reason_code").and_then(Value::as_str), + Some("DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER") + ); + assert_eq!( + report + .pointer("/same_corpus_boundary/sdk_get_all_is_ui_export_evidence") + .and_then(Value::as_bool), + Some(false) + ); + assert_eq!( + report + .pointer("/openmemory_product_surface/export_requires_running_container") + .and_then(Value::as_bool), + Some(true) + ); + assert!( + report + .pointer("/openmemory_probe/attempt/output_excerpt") + .and_then(Value::as_str) + .is_some_and(|excerpt| excerpt.contains("docker: command not found") + && excerpt.contains("Container 'openmemory-openmemory-mcp-1' not found/running")) + ); + assert_eq!( + report.pointer("/classification/comparison_judgment").and_then(Value::as_str), + Some("unchanged") + ); + assert_eq!( + report + .pointer("/claim_boundary/product_browser_or_dashboard_readback_reached") + .and_then(Value::as_bool), + Some(false) + ); + assert!(array_contains_str( + &report, + "/improvement_regression_readback/unchanged", + "OpenMemory product UI/export readback remains blocked before same-corpus product app database validation." + )?); + assert!(array_contains_str( + &report, + "/next_optimization_direction/required_fields", + "same_corpus_import_into_openmemory_app_database" + )?); + assert!(markdown.contains("OpenMemory UI/export product-readback status is unchanged")); + assert!(markdown.contains("Product browser/dashboard readback reached")); + assert!( + benchmarking_index.contains("2026-06-19-openmemory-ui-export-product-readback-report.md") + ); + assert!(readme.contains("OpenMemory UI/Export Product Readback Report - June 19, 2026")); + assert!(readme.contains("OpenMemory UI/export product recheck after XY-987")); + + Ok(()) +} + fn assert_openviking_trajectory_materialization_summary(report: &Value) -> Result<()> { assert_eq!( report.pointer("/schema").and_then(Value::as_str), diff --git a/docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md b/docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md new file mode 100644 index 00000000..631c2735 --- /dev/null +++ b/docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md @@ -0,0 +1,123 @@ +--- +type: Evidence +title: "OpenMemory UI/Export Product Readback Report - June 19, 2026" +description: "Checked-in benchmark evidence record: OpenMemory UI/Export Product Readback Report - June 19, 2026." +resource: docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md +status: active +authority: current_state +owner: evidence +last_verified: 2026-06-19 +tags: + - docs + - evidence + - benchmarking +--- +# OpenMemory UI/Export Product Readback Report - June 19, 2026 + +Goal: Recheck OpenMemory UI/export readback after the earlier setup blocker and +publish a fresh typed product-readback boundary if a local product runner still +cannot validate same-corpus OpenMemory export. +Read this when: You need to know whether XY-987 removed the OpenMemory UI/export +blocker, whether mem0 SDK `get_all` can be used as UI/export evidence, or what setup +work remains before an ELF/OpenMemory product-UX comparison is allowed. +Inputs: +`apps/elf-eval/fixtures/report_snapshots/2026-06-19-openmemory-ui-export-product-readback-report.json`, +`tmp/live-baseline/mem0-openmemory-ui-export.json`, +`tmp/live-baseline/mem0-openmemory-export-attempt.log`, +and `docs/evidence/benchmarking/2026-06-11-mem0-openmemory-history-ui-export-report.md`. +Outputs: A fresh command run, a JSON companion, an attempt-log artifact path, and a +scenario-level improved/unchanged/blocked judgment. + +## Executive Judgment + +The OpenMemory UI/export product-readback status is unchanged: still blocked. + +`cargo make openmemory-ui-export-readback` completed successfully as a benchmark +command and refreshed the mem0 local OSS SDK baseline: + +- mem0 SDK checks: 8 pass, 0 fail. +- SDK `get_all` export-style readback: pass. +- OpenMemory UI/export product readback: blocked. +- Reason code: `DOCKER_UNAVAILABLE_IN_BASELINE_RUNNER`. +- Fresh run id: `live-baseline-20260619065543`. + +This improves freshness and auditability, not competitive status. The OpenMemory +product tree, UI package, compose file, and export helper are present, but the export +helper requires Docker access to a running OpenMemory product container from inside +the baseline runner. The attempt still fails before browser/dashboard readback or +same-corpus product app database validation is reached. + +## Command Evidence + +| Command | Result | Runtime | Artifact | +| --- | --- | ---: | --- | +| `cargo make openmemory-ui-export-readback` | command pass; OpenMemory probe `blocked` | 78.02 seconds | `tmp/live-baseline/live-baseline-report.json`, `tmp/live-baseline/mem0-openmemory-ui-export.json`, `tmp/live-baseline/mem0-openmemory-export-attempt.log` | + +The probe command was: + +`timeout 30 bash openmemory/backup-scripts/export_openmemory.sh --user-id elf-history-user --container openmemory-openmemory-mcp-1` + +The attempt log records: + +```text +openmemory/backup-scripts/export_openmemory.sh: line 52: docker: command not found +ERROR: Container 'openmemory-openmemory-mcp-1' not found/running. Pass --container if different. +``` + +## Product Surface Readback + +| Surface | Status | +| --- | --- | +| OpenMemory tree present | `true` | +| UI package present | `true` | +| Compose file present | `true` | +| Export helper present | `true` | +| Sunsetting notice present | `true` | +| Requires OpenAI API key path | `true` | +| Requires Docker Compose | `true` | +| Export helper requires running container | `true` | +| Product browser/dashboard readback reached | `false` | + +## Improvement/Regression Readback + +- Improved: there is now a fresh June 19 command run, JSON companion, and attempt log + for the OpenMemory product-readback blocker. +- Unchanged: OpenMemory UI/export remains blocked before same-corpus product app + database validation. +- Unchanged: mem0 local OSS SDK history and local `get_all` readback remain separate + passing evidence. They are not UI/export product evidence. +- No regression: the command still preserves the SDK/product boundary and does not + convert a setup blocker into an ELF win or loss. + +## Claim Boundaries + +Allowed: + +- mem0 local OSS SDK checks and SDK `get_all` readback pass in the fresh run. +- OpenMemory UI/export product readback remains blocked with a concrete command, + artifact path, and setup error. +- The June 19 recheck is unchanged versus the June 11 XY-931 setup blocker except + for freshness and checked-in evidence. + +Not allowed: + +- Do not claim ELF can compare against OpenMemory UI/export after this run. +- Do not claim OpenMemory product UI/export pass from SDK-only `get_all` evidence. +- Do not claim hosted mem0 Platform behavior. +- Do not use this blocker as an ELF win or OpenMemory loss. + +## Next Optimization Direction + +The next fair product-readback attempt needs a dedicated OpenMemory Docker Compose +profile that imports the generated mem0 corpus into the OpenMemory app database, +starts API/UI with explicit local or provider configuration, and validates exported +memories against `elf-history-user`. + +Required fields before the blocker can move: + +- dedicated OpenMemory compose profile, +- same-corpus import into the OpenMemory app database, +- OpenMemory API or UI readback artifact, +- export zip validation against the benchmark-owned user, +- explicit provider or local model configuration, +- separate SDK `get_all` and product export scorers. diff --git a/docs/evidence/benchmarking/index.md b/docs/evidence/benchmarking/index.md index 6421ddeb..bf4cb800 100644 --- a/docs/evidence/benchmarking/index.md +++ b/docs/evidence/benchmarking/index.md @@ -37,6 +37,7 @@ Routes to: Benchmarking evidence concepts under `docs/evidence/benchmarking/`. - `2026-06-16-scheduled-memory-task-scoring-report.md`: Real-World Job Benchmark Report. - `2026-06-17-dreaming-competitor-strength-retest-report.md`: Dreaming Competitor-Strength Retest Report - June 17, 2026. - `2026-06-19-letta-core-archive-export-readback-report.md`: Letta Core/Archive Export-Readback Report - June 19, 2026; adds a Docker-contained Letta materialization/report command while preserving all six core/archive comparison scenarios as typed blockers until exported core block JSON, archival readback/search JSON, and source ids exist. +- `2026-06-19-openmemory-ui-export-product-readback-report.md`: OpenMemory UI/Export Product Readback Report - June 19, 2026; refreshes the product UI/export recheck and preserves the scenario as blocked because the export helper still needs Docker access to a running OpenMemory product container. - `2026-06-19-openviking-trajectory-materialization-report.md`: OpenViking Trajectory Materialization Report - June 19, 2026; materializes the context-trajectory fixture slice through a dedicated repo task while preserving staged retrieval, hierarchy selection, and recursive/context expansion as typed blockers. - `2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md`: qmd Debug-Ergonomics Dreaming Retest Report - June 19, 2026; confirms qmd's default top-k/replay edge is unchanged while ELF keeps the narrow operator-debug trace/stage visibility wins. - `2026-06-19-service-native-dreaming-readback-report.md`: Service-Native Dreaming Readback Report - June 19, 2026; materializes memory summary, proactive brief, and scheduled-memory derived outputs through `ElfService` readback with 9 pass, 0 wrong_result, and 2 typed XY-930 blockers. diff --git a/docs/log.md b/docs/log.md index b6f87575..444d40e5 100644 --- a/docs/log.md +++ b/docs/log.md @@ -49,3 +49,6 @@ logs. `cargo make real-world-memory-service-native-dreaming`, proving public/local memory summary, proactive brief, and scheduled-memory artifacts can be materialized through `ElfService` readback while preserving XY-930 private/provider blockers. +- Added the OpenMemory UI/export product readback recheck report and snapshot for + XY-987, preserving the product UI/export scenario as blocked while keeping mem0 SDK + `get_all` evidence separate from OpenMemory product evidence.