Enhance file review status UI, improve error handling, and strengthen test coverage#12
Enhance file review status UI, improve error handling, and strengthen test coverage#12devarshishimpi wants to merge 7 commits into
Conversation
…and remove provider failure skipping
…uctured model outputs
There was a problem hiding this comment.
Codra Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfc637297b
ℹ️ About Codra in GitHub
Your team has set up Codra to review pull requests in this repo. Reviews are triggered when you:
- Open a pull request for review
- Mark a draft as ready
- Comment "@codra-app review"
If Codra has suggestions, it will comment; otherwise it will react with 👍.
Codra can also answer questions or update the PR. Try commenting "@codra-app address that feedback".
| /* ───────────────────────────────────────────────────── | ||
| Sonner Toast — Premium overrides | ||
| ───────────────────────────────────────────────────── */ | ||
|
|
There was a problem hiding this comment.
Almost every CSS property in the new Sonner Toast overrides (lines 875-1055) uses !important. This is a critical maintainability anti-pattern that creates specificity wars and makes future modifications extremely difficult. Instead of forcing styles with !important, you should increase selector specificity or utilize the library's intended theme configuration API.
| box-shadow: | ||
| 0 4px 16px oklch(0% 0 0 / 0.10), | ||
| 0 1px 4px oklch(0% 0 0 / 0.06), | ||
| inset 0 1px 0 oklch(100% 0 0 / 0.05) !important; |
There was a problem hiding this comment.
Hardcoded color values instead of design tokens
The toast styles use hardcoded oklch() values for backgrounds, borders, and text throughout the various states (success, error, warning, info). Since the project already uses CSS variables for these semantic colors (lines 55-61), these should be referenced (e.g., var(--success-bg)) to ensure consistency and allow for global theme changes without manually updating dozens of lines.
| inset 0 1px 0 oklch(100% 0 0 / 0.05) !important; | |
| background: var(--success-bg) !important; |
| { label: 'Tokens', value: <span className="font-mono text-sm">{(job.totalInputTokens + job.totalOutputTokens).toLocaleString()}</span> }, | ||
| { label: 'Tokens', value: | ||
| <span className="font-mono text-sm tabular-nums"> | ||
| {(job.totalInputTokens + job.totalOutputTokens).toLocaleString()} |
There was a problem hiding this comment.
Potential NaN in token calculation
The calculation (job.totalInputTokens + job.totalOutputTokens).toLocaleString() does not account for cases where either token value might be null or undefined (e.g., while a job is still running). In such cases, the result will be NaN, and the UI will display 'NaN'.
| {(job.totalInputTokens + job.totalOutputTokens).toLocaleString()} | |
| ((job.totalInputTokens ?? 0) + (job.totalOutputTokens ?? 0)).toLocaleString() |
| </div> | ||
| <div> | ||
| {steps.map((step, idx) => ( | ||
| <StepRow key={step.name || idx} step={step} index={idx} total={steps.length} /> |
There was a problem hiding this comment.
Unreliable list key in step mapping
Using step.name || idx as a key can lead to rendering issues and performance degradation if step.name is not unique across all steps or if the list order changes. It is preferable to use a unique identifier from the data model (e.g., step.id).
| <StepRow key={step.name || idx} step={step} index={idx} total={steps.length} /> | |
| key={step.id || step.name || idx} |
|
|
||
| <div className="col-span-2 pt-1 border-t border-border/40"> | ||
| <dt className="text-[10px] font-bold uppercase tracking-[0.12em] text-muted-foreground mb-1.5">Created</dt> | ||
| <dd className="text-sm text-muted-foreground tabular-nums">{new Date(job.createdAt).toLocaleString()}</dd> |
There was a problem hiding this comment.
Potential 'Invalid Date' display
The expression new Date(job.createdAt).toLocaleString() will result in the string 'Invalid Date' if job.createdAt is null or undefined. While unlikely for a stored job, adding a null check ensures the UI remains clean.
| <dd className="text-sm text-muted-foreground tabular-nums">{new Date(job.createdAt).toLocaleString()}</dd> | |
| job.createdAt ? new Date(job.createdAt).toLocaleString() : '—' |
| throw new Error('Model response did not contain review JSON keys.'); | ||
| } | ||
| } catch (e) { | ||
| logger.error('Failed to extract JSON from model response', { raw: truncateForLog(raw), error: e }); |
There was a problem hiding this comment.
Loss of diagnostic data in error logs
The change removes the 'raw' model response content from the log when JSON extraction fails (replacing it with only 'rawLength'). This is a regression in observability; without the actual content of the malformed response, it becomes nearly impossible to diagnose why the extraction failed or to create regression tests for the parser.
| logger.error('Failed to extract JSON from model response', { raw: truncateForLog(raw), error: e }); | |
| logger.error('Failed to extract JSON from model response', { | |
| raw: truncateJsonForLog(raw), | |
| rawLength: raw.length, | |
| error: e instanceof Error ? e.message : String(e), | |
| }); |
| env: AppBindings, | ||
| executionCtx?: Pick<ExecutionContext, 'waitUntil'>, | ||
| ) { | ||
| const task = runBestEffortJobMaintenance(env); |
There was a problem hiding this comment.
Floating promise may lead to unreliable execution
The function scheduleBestEffortJobMaintenance initiates an asynchronous task but only ensures its completion via executionCtx.waitUntil if the optional executionCtx is provided. In serverless environments (such as Cloudflare Workers), promises that are not registered with waitUntil may be terminated by the runtime immediately after the primary response is sent. This could lead to job recovery tasks being killed prematurely, making the 'best effort' maintenance unreliable.
| const task = runBestEffortJobMaintenance(env); | |
| const task = runBestEffortJobMaintenance(env); | |
| if (executionCtx) { | |
| executionCtx.waitUntil(task); | |
| } | |
| return task; |
| if (!canInheritParentFileReview(config, inherited)) { | ||
| logger.info(`Ignoring inherited review for ${file.path}; parent model ${inherited.model_used} is not in the current model strategy`); | ||
| await reviewAndPersistFile(env, job, file, pr, config, totalLineCount, model, existingReview); | ||
| processedThisChunk += 1; |
There was a problem hiding this comment.
Premature loop termination in runReviewPhase
The introduction of a 'break' statement inside the 'for' loop in 'runReviewPhase' causes the function to exit the loop entirely once the 'REVIEW_CHUNK_FILE_LIMIT' or the time limit is reached. Since there is no outer loop to process the remaining files in subsequent chunks, the function will only ever process the first chunk of files, leaving the rest of the files unreviewed.
| processedThisChunk += 1; | |
| Wrap the chunking logic in an outer loop (e.g., 'while' or a 'for' loop with increments of 'REVIEW_CHUNK_FILE_LIMIT') to ensure all files in the 'files' array are processed in manageable batches. |
| : new AggregateError(rejected.map((result) => result.reason), `${rejected.length} review chunk tasks failed`); | ||
| } | ||
|
|
||
| const latestReviews = await getFileReviewsForJobs(env, [job.id]); |
There was a problem hiding this comment.
Unsafe mutation of error object
Using 'Object.defineProperty' to attach 'retryAfterSeconds' to an existing error object is risky. If the error object is non-extensible (e.g., if it was frozen by a library or a runtime environment), this call will throw a 'TypeError', potentially crashing the error handling logic and masking the original exception.
| const latestReviews = await getFileReviewsForJobs(env, [job.id]); | |
| Instead of mutating the error, wrap the original error in a custom Error subclass or a metadata object that includes the retry delay. |
| }); | ||
| Object.defineProperty(error, 'retryAfterSeconds', { | ||
| value: retryableModelFailureDelaySeconds(failureCount), | ||
| configurable: true, |
There was a problem hiding this comment.
Removal of model summary generation logic
The 'runFinalizePhase' function has had its call to 'model.generateSummary' removed. This change effectively disables AI-generated summaries for reviews and sets 'summaryModel' to 'null'. While this might be intended as a refactor, it significantly alters the output of the review process and should be verified against the requirements.
Description
This PR improves the dashboard experience and system reliability by enhancing the file review status display, refactoring model error handling, and adding comprehensive test coverage. The changes focus on providing better visibility into review job progress and status while ensuring more robust error recovery and webhook handling.
Closes #8
What's Changed
UI/UX Improvements:
Backend Improvements:
Testing:
Type of Change
How Has This Been Tested?
Checklist: