Improve shared chat scraping errors and coverage#170
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the chat scraping functionality by introducing provider-specific error messages for ChatGPT, Claude, and Gemini, and adds a comprehensive test suite for the extraction logic. Feedback highlights the duplication of the _scrape_failure_message function across modules and identifies an inconsistency in src/api/routes/memory.py where error responses lack accurate elapsed_ms timing data.
| def _scrape_failure_message(result: Dict[str, Any]) -> str: | ||
| provider = result.get("provider") or "unknown" | ||
|
|
||
| if provider in {"chatgpt", "claude", "gemini"}: | ||
| display_name = { | ||
| "chatgpt": "ChatGPT", | ||
| "claude": "Claude", | ||
| "gemini": "Gemini", | ||
| }[provider] | ||
| return ( | ||
| f"Could not extract messages from this {display_name} share link. " | ||
| "Make sure the link is public, still exists, and has not expired." | ||
| ) | ||
|
|
||
| return ( | ||
| "Failed to extract messages from the provided link. " | ||
| "Supported public share links are ChatGPT, Claude, and Gemini." | ||
| ) |
There was a problem hiding this comment.
The _scrape_failure_message function is duplicated between src/api/routes/memory.py and server.py. To improve maintainability and ensure consistent error messages across the application, consider centralizing this logic in a shared utility module or exporting it from one of the files. This avoids the need to update multiple locations when adding support for new chat providers.
| if not pairs: | ||
| return _error(request, "Failed to extract messages from the provided link.", 400) | ||
| return _error(request, _scrape_failure_message(result), 400) | ||
|
|
||
| data = ScrapeResponse(pairs=pairs) | ||
| elapsed = round((time.perf_counter() - start) * 1000, 2) |
There was a problem hiding this comment.
The error response for missing message pairs is missing the elapsed_ms timing information, which results in a default value of 0.0. This is inconsistent with the success response and the implementation in server.py. Calculating the elapsed time before the check ensures that the user receives accurate timing even when extraction fails.
| if not pairs: | |
| return _error(request, "Failed to extract messages from the provided link.", 400) | |
| return _error(request, _scrape_failure_message(result), 400) | |
| data = ScrapeResponse(pairs=pairs) | |
| elapsed = round((time.perf_counter() - start) * 1000, 2) | |
| elapsed = round((time.perf_counter() - start) * 1000, 2) | |
| if not pairs: | |
| return _error(request, _scrape_failure_message(result), 400, elapsed) | |
| data = ScrapeResponse(pairs=pairs) |
317b53d to
3f793f2
Compare
|
Hi @TsukinowaRin thanks for the contribution, could you share a video of the working functionality with gemini/claude link? |
Addresses #155.
Summary
python-multipartruntime dependency required by the FastAPI upload route at import timeVerification
uv run --extra dev pytest tests/test_chat_share_extraction.pyuv run --extra dev ruff check tests/test_chat_share_extraction.pyNote: full
ruff checkstill reports pre-existing import-order/unused-import issues inserver.pyandsrc/api/routes/memory.py; this PR keeps those unrelated files scoped to the scrape error behavior.