Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61
Refactor: eliminate export script service-layer duplication (~700 LOC) closes #42#61bradjin8 wants to merge 5 commits into
Conversation
Extract _load_bubble_map/_load_project_layouts_map/_load_code_block_diff_map to services/workspace_db.py; add cursor_ide_chat_to_markdown to utils/cursor_md_exporter.py; remove private _slug() copies from api/export_api.py and utils/cursor_md_exporter.py.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughConsolidates export logic: shared slug and Markdown exporters added, three SQLite KV loaders introduced, service modules consume those loaders, API and CLI export flows delegate DB reading and Markdown generation to the new helpers. ChangesExport Pipeline Consolidation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
api/export_api.py (2)
115-117: 💤 Low valuePrefix unused
global_db_pathwith underscore.Per Ruff RUF059, the unpacked
global_db_pathis never used. Prefix it to indicate intentional discard.Proposed fix
- with _open_global_db(workspace_path) as (global_db, global_db_path): + with _open_global_db(workspace_path) as (global_db, _global_db_path):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/export_api.py` around lines 115 - 117, The tuple unpacking from _open_global_db currently binds an unused name global_db_path; update the unpack to use a prefixed discard name (e.g. (global_db, _global_db_path)) so the unused variable is intentionally ignored and satisfies RUF059; adjust the line in export_api.py where _open_global_db is called and ensure any subsequent references (if any) are updated to the new name or removed.
99-99: 💤 Low valueUnused variable
project_name_map.
project_name_mapis assigned but never referenced in this function. Either remove it or use it where intended.Proposed fix
workspace_entries = _collect_workspace_entries(workspace_path) composer_id_to_ws = _build_composer_id_to_workspace_id(workspace_path, workspace_entries) - project_name_map = _create_project_name_to_workspace_id_map(workspace_entries)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/export_api.py` at line 99, The variable project_name_map is assigned from _create_project_name_to_workspace_id_map(workspace_entries) but never used; either remove this assignment or actually use project_name_map where intended (e.g., replace any later direct lookups of workspace_entries or hardcoded workspace id logic with project_name_map lookup). Locate the call to _create_project_name_to_workspace_id_map and either delete that line (if the mapping isn't needed) or refactor the subsequent logic to reference project_name_map for project->workspace id resolution so the mapping is consumed.scripts/export.py (1)
228-234: 💤 Low valueSQL query differs from API: missing empty-array filter.
The API query at
api/export_api.py:124-127includesAND value NOT LIKE '%fullConversationHeadersOnly\":[]%'to skip empty conversations, but this CLI query omits that filter. The subsequentif not headers: continuecheck (line 261-262) handles empty arrays anyway, so this is functionally equivalent but less efficient—it fetches rows that are immediately discarded.Consider adding the same filter for consistency and to reduce unnecessary JSON parsing.
Proposed fix
try: ide_composer_rows = global_db.execute( "SELECT key, value FROM cursorDiskKV WHERE key LIKE 'composerData:%'" " AND value LIKE '%fullConversationHeadersOnly%'" + " AND value NOT LIKE '%fullConversationHeadersOnly\":[]%'" ).fetchall() except sqlite3.Error: pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/export.py` around lines 228 - 234, The SQL in the global_db query that builds ide_composer_rows currently selects rows with "value LIKE '%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in the API; update the global_db.execute call (the SELECT on cursorDiskKV for key LIKE 'composerData:%') to add the same filter AND value NOT LIKE '%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL string) so empty-header entries are skipped at the DB level before JSON parsing.services/workspace_db.py (1)
20-20: 💤 Low valueMissing type annotation for
global_dbparameter.The three loader functions lack type hints for the
global_dbparameter. The module comment (lines 14-15) mentions the caller must setrow_factory = sqlite3.Row, but the signature doesn't document this contract.Proposed fix
-def _load_bubble_map(global_db) -> dict[str, dict]: +def _load_bubble_map(global_db: sqlite3.Connection) -> dict[str, dict]:-def _load_project_layouts_map(global_db) -> dict[str, list]: +def _load_project_layouts_map(global_db: sqlite3.Connection) -> dict[str, list]:-def _load_code_block_diff_map(global_db) -> dict[str, list]: +def _load_code_block_diff_map(global_db: sqlite3.Connection) -> dict[str, list]:Also applies to: 47-47, 83-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/workspace_db.py` at line 20, Add explicit type annotations for the global_db parameter on all three loader functions (e.g., _load_bubble_map, and the other two loader functions at the later spots) to document the required sqlite API and the caller contract (row_factory = sqlite3.Row); annotate global_db as sqlite3.Connection (and import sqlite3 at top if missing). Keep existing return types (dict[str, dict]) unchanged and ensure the function signatures reflect the new parameter type to make the contract explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utils/cursor_md_exporter.py`:
- Around line 374-375: The frontmatter assembly in cursor_md_exporter.py is
using inconsistent escaping: replace raw string appends for workspace_name and
title with a JSON-safe representation (use json.dumps) to ensure double quotes
and special characters are escaped; specifically update the fm_lines.append
calls that reference ws_display_name and the title variable so they use
json.dumps(title) and json.dumps(ws_display_name) (keeping ws_slug as-is) to
match the CLI exporter behavior.
---
Nitpick comments:
In `@api/export_api.py`:
- Around line 115-117: The tuple unpacking from _open_global_db currently binds
an unused name global_db_path; update the unpack to use a prefixed discard name
(e.g. (global_db, _global_db_path)) so the unused variable is intentionally
ignored and satisfies RUF059; adjust the line in export_api.py where
_open_global_db is called and ensure any subsequent references (if any) are
updated to the new name or removed.
- Line 99: The variable project_name_map is assigned from
_create_project_name_to_workspace_id_map(workspace_entries) but never used;
either remove this assignment or actually use project_name_map where intended
(e.g., replace any later direct lookups of workspace_entries or hardcoded
workspace id logic with project_name_map lookup). Locate the call to
_create_project_name_to_workspace_id_map and either delete that line (if the
mapping isn't needed) or refactor the subsequent logic to reference
project_name_map for project->workspace id resolution so the mapping is
consumed.
In `@scripts/export.py`:
- Around line 228-234: The SQL in the global_db query that builds
ide_composer_rows currently selects rows with "value LIKE
'%fullConversationHeadersOnly%'" but misses the empty-array exclusion used in
the API; update the global_db.execute call (the SELECT on cursorDiskKV for key
LIKE 'composerData:%') to add the same filter AND value NOT LIKE
'%fullConversationHeadersOnly\":[]%' (properly escape the quote in the SQL
string) so empty-header entries are skipped at the DB level before JSON parsing.
In `@services/workspace_db.py`:
- Line 20: Add explicit type annotations for the global_db parameter on all
three loader functions (e.g., _load_bubble_map, and the other two loader
functions at the later spots) to document the required sqlite API and the caller
contract (row_factory = sqlite3.Row); annotate global_db as sqlite3.Connection
(and import sqlite3 at top if missing). Keep existing return types (dict[str,
dict]) unchanged and ensure the function signatures reflect the new parameter
type to make the contract explicit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ad14647f-26f6-4846-a186-f121c5fd8675
📒 Files selected for processing (7)
api/export_api.pyscripts/export.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_tabs.pyutils/cursor_md_exporter.pyutils/text_extract.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/export.py (1)
182-182:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd explicit
encoding="utf-8"for consistency.Other file operations in this script specify
encoding="utf-8"(lines 91, 111, 460, 496). Missing it here could cause issues on systems where the default encoding isn't UTF-8.Proposed fix
- with open(state_path, "r") as f: + with open(state_path, "r", encoding="utf-8") as f:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/export.py` at line 182, The with-open reading of state_path currently omits an encoding and should explicitly use encoding="utf-8"; update the open call that reads state_path (the with open(state_path, "r") context) to include encoding="utf-8" so it matches other file operations and avoids platform-dependent defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@scripts/export.py`:
- Line 182: The with-open reading of state_path currently omits an encoding and
should explicitly use encoding="utf-8"; update the open call that reads
state_path (the with open(state_path, "r") context) to include encoding="utf-8"
so it matches other file operations and avoids platform-dependent defaults.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bdd0fa04-b35b-4e8a-bb85-a0a33b87f625
📒 Files selected for processing (9)
api/export_api.pyapi/workspaces.pyscripts/export.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_resolver.pyservices/workspace_tabs.pyutils/cursor_md_exporter.pyutils/workspace_descriptor.py
Summary
Closes #42.
scripts/export.pypreviously reimplemented ~700 lines of logic already present in the service and utility layers: workspace scanning, database access, text extraction, slug generation, and Markdown rendering. This PR eliminates those duplications across the entire export pipeline.What changed
scripts/export.py(~550 lines removed)get_default_workspace_path,resolve_workspace_path,extract_text_from_rich_text,extract_text_from_bubble,slug, andget_workspace_folder_pathswrapper — replaced with imports fromutils.workspace_pathandutils.text_extractservices.workspace_db._collect_workspace_entries()_open_global_db()context managerget_project_from_file_path+assign_workspaceclosures) replaced byservices.workspace_resolverfunctionsutils.cursor_md_exporter.cursor_ide_chat_to_markdown()sys.path.insertguarded byif __name__ == "__main__":so it is inert when the package is installed via thepyproject.tomlentry pointservices/workspace_db.py(3 new functions)_load_bubble_map(global_db)—bubbleId:*→{bubble_id: dict}_load_project_layouts_map(global_db)—messageRequestContext:*→{composer_id: [root_path]}_load_code_block_diff_map(global_db)—codeBlockDiff:*→{composer_id: [diff_dict]}These replace copy-pasted
cursorDiskKVquery + parse loops inscripts/export.py,services/workspace_listing.py,services/workspace_tabs.py, andapi/export_api.py.utils/cursor_md_exporter.pycursor_ide_chat_to_markdown(composer_data, composer_id, bubble_map, code_block_diff_map, workspace_info)added alongside the existingcursor_cli_session_to_markdown— encapsulates the full per-composer rendering: bubble assembly, response-time computation, session aggregates, file/command tracking, frontmatter, and session-summary table_slug()removed;slugimported fromutils.text_extractutils/text_extract.pyslug(s)added — filesystem-safe slug conversion (was duplicated in multiple callers)api/export_api.py_slug()removed; usesslugfromutils.text_extractbubble_maploading replaced with_load_bubble_map()cursor_ide_chat_to_markdown(); web export now emits the same rich frontmatter and session-summary block as the CLI exportservices/workspace_listing.py,services/workspace_tabs.pybubbleId/messageRequestContext/codeBlockDiffloading loops replaced with the new_load_*functionsTest plan
python -m pytest -v— 297 passed, 4 skipped (matches baseline)python scripts/export.py --no-zip --out /tmp/test-export— inspect that Markdown files are created with correct frontmatter and session-summary blocksPOST /api/export) downloads and contains valid Markdownpython scripts/export.py --since last— skips chats exported before; state file updatedSummary by CodeRabbit
New Features
Refactor
Improvements