Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42 by bradjin8 · Pull Request #61 · cppalliance/cppa-cursor-browser

bradjin8 · 2026-05-20T16:21:18Z

Summary

Closes #42.

scripts/export.py previously reimplemented ~700 lines of logic already present in the service and utility layers: workspace scanning, database access, text extraction, slug generation, and Markdown rendering. This PR eliminates those duplications across the entire export pipeline.

What changed

scripts/export.py (~550 lines removed)

Deleted private copies of get_default_workspace_path, resolve_workspace_path, extract_text_from_rich_text, extract_text_from_bubble, slug, and get_workspace_folder_paths wrapper — replaced with imports from utils.workspace_path and utils.text_extract
Workspace scanning delegated to services.workspace_db._collect_workspace_entries()
Database connection management switched to the _open_global_db() context manager
Workspace/project assignment (local get_project_from_file_path + assign_workspace closures) replaced by services.workspace_resolver functions
IDE chat Markdown generation loop extracted to utils.cursor_md_exporter.cursor_ide_chat_to_markdown()
sys.path.insert guarded by if __name__ == "__main__": so it is inert when the package is installed via the pyproject.toml entry point

services/workspace_db.py (3 new functions)

_load_bubble_map(global_db) — bubbleId:* → {bubble_id: dict}
_load_project_layouts_map(global_db) — messageRequestContext:* → {composer_id: [root_path]}
_load_code_block_diff_map(global_db) — codeBlockDiff:* → {composer_id: [diff_dict]}

These replace copy-pasted cursorDiskKV query + parse loops in scripts/export.py, services/workspace_listing.py, services/workspace_tabs.py, and api/export_api.py.

utils/cursor_md_exporter.py

cursor_ide_chat_to_markdown(composer_data, composer_id, bubble_map, code_block_diff_map, workspace_info) added alongside the existing cursor_cli_session_to_markdown — encapsulates the full per-composer rendering: bubble assembly, response-time computation, session aggregates, file/command tracking, frontmatter, and session-summary table
Private _slug() removed; slug imported from utils.text_extract

utils/text_extract.py

slug(s) added — filesystem-safe slug conversion (was duplicated in multiple callers)

api/export_api.py

Private _slug() removed; uses slug from utils.text_extract
Workspace scanning replaced with service layer calls
bubble_map loading replaced with _load_bubble_map()
Markdown generation replaced with cursor_ide_chat_to_markdown(); web export now emits the same rich frontmatter and session-summary block as the CLI export

services/workspace_listing.py, services/workspace_tabs.py

Inline bubbleId / messageRequestContext / codeBlockDiff loading loops replaced with the new _load_* functions

Test plan

python -m pytest -v — 297 passed, 4 skipped (matches baseline)
python scripts/export.py --no-zip --out /tmp/test-export — inspect that Markdown files are created with correct frontmatter and session-summary blocks
Web export zip (POST /api/export) downloads and contains valid Markdown
python scripts/export.py --since last — skips chats exported before; state file updated

Summary by CodeRabbit

New Features
- Export Cursor IDE composer and CLI sessions to rich Markdown with YAML metadata, session summaries, per-bubble content, optional thinking/details, and tool I/O blocks.
Refactor
- Export and workspace discovery flows reorganized to use shared services for more consistent, reliable exports.
Improvements
- Importable CLI tool, improved filesystem-safe filename slugs, inclusion of code-diff edits, richer activity/tool counts, and updated export output layout.

Extract _load_bubble_map/_load_project_layouts_map/_load_code_block_diff_map to services/workspace_db.py; add cursor_ide_chat_to_markdown to utils/cursor_md_exporter.py; remove private _slug() copies from api/export_api.py and utils/cursor_md_exporter.py.

coderabbitai · 2026-05-20T16:21:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1590ea8-ae8a-4adf-9fec-25320da71a25

📥 Commits

Reviewing files that changed from the base of the PR and between aa16600 and d31fb3e.

📒 Files selected for processing (1)

scripts/export.py

📝 Walkthrough

Walkthrough

Consolidates export logic: shared slug and Markdown exporters added, three SQLite KV loaders introduced, service modules consume those loaders, API and CLI export flows delegate DB reading and Markdown generation to the new helpers.

Changes

Export Pipeline Consolidation

Layer / File(s)	Summary
Shared utility extraction and Markdown exporter `utils/text_extract.py`, `utils/cursor_md_exporter.py`	Adds `slug()` and a public `cursor_ide_chat_to_markdown()` that builds YAML-frontmatter + Markdown from composer KV data, appends synthetic code-edit bubbles, aggregates timing/tool stats, and renders per-bubble bodies; removes local `_slug` helper.
Service layer KV loaders for global storage `services/workspace_db.py`	Adds helpers that read `cursorDiskKV` rows into typed in-memory maps (bubbles, project layouts, code-block diffs) with defensive JSON parsing and sqlite error handling.
Service consumers adopt KV loaders `services/workspace_listing.py`, `services/workspace_tabs.py`, `services/workspace_resolver.py`, `utils/workspace_descriptor.py`, `api/workspaces.py`	Replaces inline `cursorDiskKV` parsing and regex helpers with calls to the new loaders and updates workspace descriptor JSON reads to use the public `read_json_file` / `basename_from_pathish`.
API export route refactor `api/export_api.py`	`export_chats()` now uses service-layer workspace discovery and DB opening, loads bubble/code-diff maps via shared loaders, filters by `since`, builds exclusion/search text from bubbles and composer JSON, delegates Markdown assembly to `cursor_ide_chat_to_markdown`, and writes export entries into an in-memory ZIP.
CLI script modularization and delegation `scripts/export.py`	Makes the script importable, removes several local helpers (workspace path resolution, slugging, rich-text extraction), delegates workspace scanning/DB reading to services, and delegates Markdown generation to shared exporter functions while preserving manifest/state writing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#23: Modifies the same export_chats() route with explicit SQLite connection lifecycle management; closely related to API export handler changes.
cppalliance/cppa-cursor-browser#8: Overlaps on shared export pipeline and Cursor CLI/Markdown export wiring.

Suggested reviewers

clean6378-max-it
wpak-ai

Poem

🐇 I hopped through code, cleaned slugs and threads,
Bubbles and diffs in tidy homesteads,
Loaders hum softly, exports align,
CLI and API now share the same vine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 74.19% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main change: refactoring to eliminate duplication between export script and service layer, with the ~700 LOC metric demonstrating scope.
Linked Issues check	✅ Passed	All acceptance criteria from `#42` are addressed: scripts/export.py delegates to services/workspace_db and workspace_listing [`#42`], removed private utility copies and now imports from utils [`#42`], workspace scanning delegated [`#42`], conversation assembly delegated via cursor_md_exporter [`#42`].
Out of Scope Changes check	✅ Passed	All changes directly support the refactoring objective: consolidating duplicated logic into service/utility modules, eliminating private copies, and establishing proper import paths without unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/export-script-reimplements

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

api/export_api.py (2)

115-117: 💤 Low value

Prefix unused global_db_path with underscore.

Per Ruff RUF059, the unpacked global_db_path is never used. Prefix it to indicate intentional discard.

Proposed fix

-        with _open_global_db(workspace_path) as (global_db, global_db_path):
+        with _open_global_db(workspace_path) as (global_db, _global_db_path):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/export_api.py` around lines 115 - 117, The tuple unpacking from
_open_global_db currently binds an unused name global_db_path; update the unpack
to use a prefixed discard name (e.g. (global_db, _global_db_path)) so the unused
variable is intentionally ignored and satisfies RUF059; adjust the line in
export_api.py where _open_global_db is called and ensure any subsequent
references (if any) are updated to the new name or removed.

99-99: 💤 Low value

Unused variable project_name_map.

project_name_map is assigned but never referenced in this function. Either remove it or use it where intended.

Proposed fix

         workspace_entries = _collect_workspace_entries(workspace_path)
         composer_id_to_ws = _build_composer_id_to_workspace_id(workspace_path, workspace_entries)
-        project_name_map = _create_project_name_to_workspace_id_map(workspace_entries)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/export_api.py` at line 99, The variable project_name_map is assigned from
_create_project_name_to_workspace_id_map(workspace_entries) but never used;
either remove this assignment or actually use project_name_map where intended
(e.g., replace any later direct lookups of workspace_entries or hardcoded
workspace id logic with project_name_map lookup). Locate the call to
_create_project_name_to_workspace_id_map and either delete that line (if the
mapping isn't needed) or refactor the subsequent logic to reference
project_name_map for project->workspace id resolution so the mapping is
consumed.

scripts/export.py (1)

228-234: 💤 Low value

SQL query differs from API: missing empty-array filter.

The API query at api/export_api.py:124-127 includes AND value NOT LIKE '%fullConversationHeadersOnly\":[]%' to skip empty conversations, but this CLI query omits that filter. The subsequent if not headers: continue check (line 261-262) handles empty arrays anyway, so this is functionally equivalent but less efficient—it fetches rows that are immediately discarded.

Consider adding the same filter for consistency and to reduce unnecessary JSON parsing.
Proposed fix
             try:
                 ide_composer_rows = global_db.execute(
                     "SELECT key, value FROM cursorDiskKV WHERE key LIKE 'composerData:%'"
                     " AND value LIKE '%fullConversationHeadersOnly%'"
+                    " AND value NOT LIKE '%fullConversationHeadersOnly\":[]%'"
                 ).fetchall()
             except sqlite3.Error:
                 pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/export.py` around lines 228 - 234, The SQL in the global_db query
that builds ide_composer_rows currently selects rows with "value LIKE
'%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in
the API; update the global_db.execute call (the SELECT on cursorDiskKV for key
LIKE 'composerData:%') to add the same filter AND value NOT LIKE
'%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL
string) so empty-header entries are skipped at the DB level before JSON parsing.

services/workspace_db.py (1)

20-20: 💤 Low value

Missing type annotation for global_db parameter.

The three loader functions lack type hints for the global_db parameter. The module comment (lines 14-15) mentions the caller must set row_factory = sqlite3.Row, but the signature doesn't document this contract.

Proposed fix

-def _load_bubble_map(global_db) -> dict[str, dict]:
+def _load_bubble_map(global_db: sqlite3.Connection) -> dict[str, dict]:

-def _load_project_layouts_map(global_db) -> dict[str, list]:
+def _load_project_layouts_map(global_db: sqlite3.Connection) -> dict[str, list]:

-def _load_code_block_diff_map(global_db) -> dict[str, list]:
+def _load_code_block_diff_map(global_db: sqlite3.Connection) -> dict[str, list]:

Also applies to: 47-47, 83-83

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_db.py` at line 20, Add explicit type annotations for the
global_db parameter on all three loader functions (e.g., _load_bubble_map, and
the other two loader functions at the later spots) to document the required
sqlite API and the caller contract (row_factory = sqlite3.Row); annotate
global_db as sqlite3.Connection (and import sqlite3 at top if missing). Keep
existing return types (dict[str, dict]) unchanged and ensure the function
signatures reflect the new parameter type to make the contract explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utils/cursor_md_exporter.py`:
- Around line 374-375: The frontmatter assembly in cursor_md_exporter.py is
using inconsistent escaping: replace raw string appends for workspace_name and
title with a JSON-safe representation (use json.dumps) to ensure double quotes
and special characters are escaped; specifically update the fm_lines.append
calls that reference ws_display_name and the title variable so they use
json.dumps(title) and json.dumps(ws_display_name) (keeping ws_slug as-is) to
match the CLI exporter behavior.

---

Nitpick comments:
In `@api/export_api.py`:
- Around line 115-117: The tuple unpacking from _open_global_db currently binds
an unused name global_db_path; update the unpack to use a prefixed discard name
(e.g. (global_db, _global_db_path)) so the unused variable is intentionally
ignored and satisfies RUF059; adjust the line in export_api.py where
_open_global_db is called and ensure any subsequent references (if any) are
updated to the new name or removed.
- Line 99: The variable project_name_map is assigned from
_create_project_name_to_workspace_id_map(workspace_entries) but never used;
either remove this assignment or actually use project_name_map where intended
(e.g., replace any later direct lookups of workspace_entries or hardcoded
workspace id logic with project_name_map lookup). Locate the call to
_create_project_name_to_workspace_id_map and either delete that line (if the
mapping isn't needed) or refactor the subsequent logic to reference
project_name_map for project->workspace id resolution so the mapping is
consumed.

In `@scripts/export.py`:
- Around line 228-234: The SQL in the global_db query that builds
ide_composer_rows currently selects rows with "value LIKE
'%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in
the API; update the global_db.execute call (the SELECT on cursorDiskKV for key
LIKE 'composerData:%') to add the same filter AND value NOT LIKE
'%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL
string) so empty-header entries are skipped at the DB level before JSON parsing.

In `@services/workspace_db.py`:
- Line 20: Add explicit type annotations for the global_db parameter on all
three loader functions (e.g., _load_bubble_map, and the other two loader
functions at the later spots) to document the required sqlite API and the caller
contract (row_factory = sqlite3.Row); annotate global_db as sqlite3.Connection
(and import sqlite3 at top if missing). Keep existing return types (dict[str,
dict]) unchanged and ensure the function signatures reflect the new parameter
type to make the contract explicit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad14647f-26f6-4846-a186-f121c5fd8675

📥 Commits

Reviewing files that changed from the base of the PR and between 34aaedd and 621c3a3.

📒 Files selected for processing (7)

api/export_api.py
scripts/export.py
services/workspace_db.py
services/workspace_listing.py
services/workspace_tabs.py
utils/cursor_md_exporter.py
utils/text_extract.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/export.py (1)
182-182: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit encoding="utf-8" for consistency.

Other file operations in this script specify encoding="utf-8" (lines 91, 111, 460, 496). Missing it here could cause issues on systems where the default encoding isn't UTF-8.
Proposed fix
-            with open(state_path, "r") as f:
+            with open(state_path, "r", encoding="utf-8") as f:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/export.py` at line 182, The with-open reading of state_path currently
omits an encoding and should explicitly use encoding="utf-8"; update the open
call that reads state_path (the with open(state_path, "r") context) to include
encoding="utf-8" so it matches other file operations and avoids
platform-dependent defaults.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/export.py`:
- Line 182: The with-open reading of state_path currently omits an encoding and
should explicitly use encoding="utf-8"; update the open call that reads
state_path (the with open(state_path, "r") context) to include encoding="utf-8"
so it matches other file operations and avoids platform-dependent defaults.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdd0fa04-b35b-4e8a-bb85-a0a33b87f625

📥 Commits

Reviewing files that changed from the base of the PR and between c7ab45c and aa16600.

📒 Files selected for processing (9)

api/export_api.py
api/workspaces.py
scripts/export.py
services/workspace_db.py
services/workspace_listing.py
services/workspace_resolver.py
services/workspace_tabs.py
utils/cursor_md_exporter.py
utils/workspace_descriptor.py

bradjin8 added 2 commits May 19, 2026 13:10

initial implementation of export refinements

b4b0aaf

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread utils/cursor_md_exporter.py Outdated

bradjin8 self-assigned this May 20, 2026

fix: inconsistent YAML escaping for workspace_name

c7ab45c

bradjin8 requested a review from timon0305 May 20, 2026 16:31

timon0305 requested changes May 21, 2026

View reviewed changes

fix: review comments

aa16600

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

fix: state_path omits an encoding

d31fb3e

bradjin8 requested a review from timon0305 May 21, 2026 14:00

timon0305 approved these changes May 21, 2026

View reviewed changes

timon0305 requested a review from wpak-ai May 21, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61

Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61
bradjin8 wants to merge 5 commits into
masterfrom
feat/export-script-reimplements

bradjin8 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bradjin8 commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bradjin8 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading