Skip to content

Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61

Open
bradjin8 wants to merge 5 commits into
masterfrom
feat/export-script-reimplements
Open

Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61
bradjin8 wants to merge 5 commits into
masterfrom
feat/export-script-reimplements

Conversation

@bradjin8
Copy link
Copy Markdown
Collaborator

@bradjin8 bradjin8 commented May 20, 2026

Summary

Closes #42.

scripts/export.py previously reimplemented ~700 lines of logic already present in the service and utility layers: workspace scanning, database access, text extraction, slug generation, and Markdown rendering. This PR eliminates those duplications across the entire export pipeline.

What changed

scripts/export.py (~550 lines removed)

  • Deleted private copies of get_default_workspace_path, resolve_workspace_path, extract_text_from_rich_text, extract_text_from_bubble, slug, and get_workspace_folder_paths wrapper — replaced with imports from utils.workspace_path and utils.text_extract
  • Workspace scanning delegated to services.workspace_db._collect_workspace_entries()
  • Database connection management switched to the _open_global_db() context manager
  • Workspace/project assignment (local get_project_from_file_path + assign_workspace closures) replaced by services.workspace_resolver functions
  • IDE chat Markdown generation loop extracted to utils.cursor_md_exporter.cursor_ide_chat_to_markdown()
  • sys.path.insert guarded by if __name__ == "__main__": so it is inert when the package is installed via the pyproject.toml entry point

services/workspace_db.py (3 new functions)

  • _load_bubble_map(global_db)bubbleId:*{bubble_id: dict}
  • _load_project_layouts_map(global_db)messageRequestContext:*{composer_id: [root_path]}
  • _load_code_block_diff_map(global_db)codeBlockDiff:*{composer_id: [diff_dict]}

These replace copy-pasted cursorDiskKV query + parse loops in scripts/export.py, services/workspace_listing.py, services/workspace_tabs.py, and api/export_api.py.

utils/cursor_md_exporter.py

  • cursor_ide_chat_to_markdown(composer_data, composer_id, bubble_map, code_block_diff_map, workspace_info) added alongside the existing cursor_cli_session_to_markdown — encapsulates the full per-composer rendering: bubble assembly, response-time computation, session aggregates, file/command tracking, frontmatter, and session-summary table
  • Private _slug() removed; slug imported from utils.text_extract

utils/text_extract.py

  • slug(s) added — filesystem-safe slug conversion (was duplicated in multiple callers)

api/export_api.py

  • Private _slug() removed; uses slug from utils.text_extract
  • Workspace scanning replaced with service layer calls
  • bubble_map loading replaced with _load_bubble_map()
  • Markdown generation replaced with cursor_ide_chat_to_markdown(); web export now emits the same rich frontmatter and session-summary block as the CLI export

services/workspace_listing.py, services/workspace_tabs.py

  • Inline bubbleId / messageRequestContext / codeBlockDiff loading loops replaced with the new _load_* functions

Test plan

  • python -m pytest -v — 297 passed, 4 skipped (matches baseline)
  • python scripts/export.py --no-zip --out /tmp/test-export — inspect that Markdown files are created with correct frontmatter and session-summary blocks
  • Web export zip (POST /api/export) downloads and contains valid Markdown
  • python scripts/export.py --since last — skips chats exported before; state file updated

Summary by CodeRabbit

  • New Features

    • Export Cursor IDE composer and CLI sessions to rich Markdown with YAML metadata, session summaries, per-bubble content, optional thinking/details, and tool I/O blocks.
  • Refactor

    • Export and workspace discovery flows reorganized to use shared services for more consistent, reliable exports.
  • Improvements

    • Importable CLI tool, improved filesystem-safe filename slugs, inclusion of code-diff edits, richer activity/tool counts, and updated export output layout.

Review Change Stack

bradjin8 added 2 commits May 19, 2026 13:10
Extract _load_bubble_map/_load_project_layouts_map/_load_code_block_diff_map
to services/workspace_db.py; add cursor_ide_chat_to_markdown to
utils/cursor_md_exporter.py; remove private _slug() copies from
api/export_api.py and utils/cursor_md_exporter.py.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1590ea8-ae8a-4adf-9fec-25320da71a25

📥 Commits

Reviewing files that changed from the base of the PR and between aa16600 and d31fb3e.

📒 Files selected for processing (1)
  • scripts/export.py

📝 Walkthrough

Walkthrough

Consolidates export logic: shared slug and Markdown exporters added, three SQLite KV loaders introduced, service modules consume those loaders, API and CLI export flows delegate DB reading and Markdown generation to the new helpers.

Changes

Export Pipeline Consolidation

Layer / File(s) Summary
Shared utility extraction and Markdown exporter
utils/text_extract.py, utils/cursor_md_exporter.py
Adds slug() and a public cursor_ide_chat_to_markdown() that builds YAML-frontmatter + Markdown from composer KV data, appends synthetic code-edit bubbles, aggregates timing/tool stats, and renders per-bubble bodies; removes local _slug helper.
Service layer KV loaders for global storage
services/workspace_db.py
Adds helpers that read cursorDiskKV rows into typed in-memory maps (bubbles, project layouts, code-block diffs) with defensive JSON parsing and sqlite error handling.
Service consumers adopt KV loaders
services/workspace_listing.py, services/workspace_tabs.py, services/workspace_resolver.py, utils/workspace_descriptor.py, api/workspaces.py
Replaces inline cursorDiskKV parsing and regex helpers with calls to the new loaders and updates workspace descriptor JSON reads to use the public read_json_file / basename_from_pathish.
API export route refactor
api/export_api.py
export_chats() now uses service-layer workspace discovery and DB opening, loads bubble/code-diff maps via shared loaders, filters by since, builds exclusion/search text from bubbles and composer JSON, delegates Markdown assembly to cursor_ide_chat_to_markdown, and writes export entries into an in-memory ZIP.
CLI script modularization and delegation
scripts/export.py
Makes the script importable, removes several local helpers (workspace path resolution, slugging, rich-text extraction), delegates workspace scanning/DB reading to services, and delegates Markdown generation to shared exporter functions while preserving manifest/state writing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • clean6378-max-it
  • wpak-ai

Poem

🐇 I hopped through code, cleaned slugs and threads,
Bubbles and diffs in tidy homesteads,
Loaders hum softly, exports align,
CLI and API now share the same vine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 74.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: refactoring to eliminate duplication between export script and service layer, with the ~700 LOC metric demonstrating scope.
Linked Issues check ✅ Passed All acceptance criteria from #42 are addressed: scripts/export.py delegates to services/workspace_db and workspace_listing [#42], removed private utility copies and now imports from utils [#42], workspace scanning delegated [#42], conversation assembly delegated via cursor_md_exporter [#42].
Out of Scope Changes check ✅ Passed All changes directly support the refactoring objective: consolidating duplicated logic into service/utility modules, eliminating private copies, and establishing proper import paths without unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/export-script-reimplements

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
api/export_api.py (2)

115-117: 💤 Low value

Prefix unused global_db_path with underscore.

Per Ruff RUF059, the unpacked global_db_path is never used. Prefix it to indicate intentional discard.

Proposed fix
-        with _open_global_db(workspace_path) as (global_db, global_db_path):
+        with _open_global_db(workspace_path) as (global_db, _global_db_path):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/export_api.py` around lines 115 - 117, The tuple unpacking from
_open_global_db currently binds an unused name global_db_path; update the unpack
to use a prefixed discard name (e.g. (global_db, _global_db_path)) so the unused
variable is intentionally ignored and satisfies RUF059; adjust the line in
export_api.py where _open_global_db is called and ensure any subsequent
references (if any) are updated to the new name or removed.

99-99: 💤 Low value

Unused variable project_name_map.

project_name_map is assigned but never referenced in this function. Either remove it or use it where intended.

Proposed fix
         workspace_entries = _collect_workspace_entries(workspace_path)
         composer_id_to_ws = _build_composer_id_to_workspace_id(workspace_path, workspace_entries)
-        project_name_map = _create_project_name_to_workspace_id_map(workspace_entries)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/export_api.py` at line 99, The variable project_name_map is assigned from
_create_project_name_to_workspace_id_map(workspace_entries) but never used;
either remove this assignment or actually use project_name_map where intended
(e.g., replace any later direct lookups of workspace_entries or hardcoded
workspace id logic with project_name_map lookup). Locate the call to
_create_project_name_to_workspace_id_map and either delete that line (if the
mapping isn't needed) or refactor the subsequent logic to reference
project_name_map for project->workspace id resolution so the mapping is
consumed.
scripts/export.py (1)

228-234: 💤 Low value

SQL query differs from API: missing empty-array filter.

The API query at api/export_api.py:124-127 includes AND value NOT LIKE '%fullConversationHeadersOnly\":[]%' to skip empty conversations, but this CLI query omits that filter. The subsequent if not headers: continue check (line 261-262) handles empty arrays anyway, so this is functionally equivalent but less efficient—it fetches rows that are immediately discarded.

Consider adding the same filter for consistency and to reduce unnecessary JSON parsing.

Proposed fix
             try:
                 ide_composer_rows = global_db.execute(
                     "SELECT key, value FROM cursorDiskKV WHERE key LIKE 'composerData:%'"
                     " AND value LIKE '%fullConversationHeadersOnly%'"
+                    " AND value NOT LIKE '%fullConversationHeadersOnly\":[]%'"
                 ).fetchall()
             except sqlite3.Error:
                 pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/export.py` around lines 228 - 234, The SQL in the global_db query
that builds ide_composer_rows currently selects rows with "value LIKE
'%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in
the API; update the global_db.execute call (the SELECT on cursorDiskKV for key
LIKE 'composerData:%') to add the same filter AND value NOT LIKE
'%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL
string) so empty-header entries are skipped at the DB level before JSON parsing.
services/workspace_db.py (1)

20-20: 💤 Low value

Missing type annotation for global_db parameter.

The three loader functions lack type hints for the global_db parameter. The module comment (lines 14-15) mentions the caller must set row_factory = sqlite3.Row, but the signature doesn't document this contract.

Proposed fix
-def _load_bubble_map(global_db) -> dict[str, dict]:
+def _load_bubble_map(global_db: sqlite3.Connection) -> dict[str, dict]:
-def _load_project_layouts_map(global_db) -> dict[str, list]:
+def _load_project_layouts_map(global_db: sqlite3.Connection) -> dict[str, list]:
-def _load_code_block_diff_map(global_db) -> dict[str, list]:
+def _load_code_block_diff_map(global_db: sqlite3.Connection) -> dict[str, list]:

Also applies to: 47-47, 83-83

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_db.py` at line 20, Add explicit type annotations for the
global_db parameter on all three loader functions (e.g., _load_bubble_map, and
the other two loader functions at the later spots) to document the required
sqlite API and the caller contract (row_factory = sqlite3.Row); annotate
global_db as sqlite3.Connection (and import sqlite3 at top if missing). Keep
existing return types (dict[str, dict]) unchanged and ensure the function
signatures reflect the new parameter type to make the contract explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utils/cursor_md_exporter.py`:
- Around line 374-375: The frontmatter assembly in cursor_md_exporter.py is
using inconsistent escaping: replace raw string appends for workspace_name and
title with a JSON-safe representation (use json.dumps) to ensure double quotes
and special characters are escaped; specifically update the fm_lines.append
calls that reference ws_display_name and the title variable so they use
json.dumps(title) and json.dumps(ws_display_name) (keeping ws_slug as-is) to
match the CLI exporter behavior.

---

Nitpick comments:
In `@api/export_api.py`:
- Around line 115-117: The tuple unpacking from _open_global_db currently binds
an unused name global_db_path; update the unpack to use a prefixed discard name
(e.g. (global_db, _global_db_path)) so the unused variable is intentionally
ignored and satisfies RUF059; adjust the line in export_api.py where
_open_global_db is called and ensure any subsequent references (if any) are
updated to the new name or removed.
- Line 99: The variable project_name_map is assigned from
_create_project_name_to_workspace_id_map(workspace_entries) but never used;
either remove this assignment or actually use project_name_map where intended
(e.g., replace any later direct lookups of workspace_entries or hardcoded
workspace id logic with project_name_map lookup). Locate the call to
_create_project_name_to_workspace_id_map and either delete that line (if the
mapping isn't needed) or refactor the subsequent logic to reference
project_name_map for project->workspace id resolution so the mapping is
consumed.

In `@scripts/export.py`:
- Around line 228-234: The SQL in the global_db query that builds
ide_composer_rows currently selects rows with "value LIKE
'%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in
the API; update the global_db.execute call (the SELECT on cursorDiskKV for key
LIKE 'composerData:%') to add the same filter AND value NOT LIKE
'%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL
string) so empty-header entries are skipped at the DB level before JSON parsing.

In `@services/workspace_db.py`:
- Line 20: Add explicit type annotations for the global_db parameter on all
three loader functions (e.g., _load_bubble_map, and the other two loader
functions at the later spots) to document the required sqlite API and the caller
contract (row_factory = sqlite3.Row); annotate global_db as sqlite3.Connection
(and import sqlite3 at top if missing). Keep existing return types (dict[str,
dict]) unchanged and ensure the function signatures reflect the new parameter
type to make the contract explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad14647f-26f6-4846-a186-f121c5fd8675

📥 Commits

Reviewing files that changed from the base of the PR and between 34aaedd and 621c3a3.

📒 Files selected for processing (7)
  • api/export_api.py
  • scripts/export.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_tabs.py
  • utils/cursor_md_exporter.py
  • utils/text_extract.py

Comment thread utils/cursor_md_exporter.py Outdated
@bradjin8 bradjin8 self-assigned this May 20, 2026
@bradjin8 bradjin8 requested a review from timon0305 May 20, 2026 16:31
Comment thread utils/cursor_md_exporter.py Outdated
Comment thread utils/cursor_md_exporter.py Outdated
Comment thread utils/cursor_md_exporter.py Outdated
Comment thread services/workspace_db.py Outdated
Comment thread services/workspace_db.py Outdated
Comment thread services/workspace_db.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/export.py (1)

182-182: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit encoding="utf-8" for consistency.

Other file operations in this script specify encoding="utf-8" (lines 91, 111, 460, 496). Missing it here could cause issues on systems where the default encoding isn't UTF-8.

Proposed fix
-            with open(state_path, "r") as f:
+            with open(state_path, "r", encoding="utf-8") as f:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/export.py` at line 182, The with-open reading of state_path currently
omits an encoding and should explicitly use encoding="utf-8"; update the open
call that reads state_path (the with open(state_path, "r") context) to include
encoding="utf-8" so it matches other file operations and avoids
platform-dependent defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/export.py`:
- Line 182: The with-open reading of state_path currently omits an encoding and
should explicitly use encoding="utf-8"; update the open call that reads
state_path (the with open(state_path, "r") context) to include encoding="utf-8"
so it matches other file operations and avoids platform-dependent defaults.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdd0fa04-b35b-4e8a-bb85-a0a33b87f625

📥 Commits

Reviewing files that changed from the base of the PR and between c7ab45c and aa16600.

📒 Files selected for processing (9)
  • api/export_api.py
  • api/workspaces.py
  • scripts/export.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_resolver.py
  • services/workspace_tabs.py
  • utils/cursor_md_exporter.py
  • utils/workspace_descriptor.py

@bradjin8 bradjin8 requested a review from timon0305 May 21, 2026 14:00
@timon0305 timon0305 requested a review from wpak-ai May 21, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export Script Reimplements Service Layer (~700 LOC)

2 participants