DEVEX-694: Fix API catalog $ref dereferencing and add $defs deduplication#50
Open
jpage-godaddy wants to merge 5 commits into
Open
DEVEX-694: Fix API catalog $ref dereferencing and add $defs deduplication#50jpage-godaddy wants to merge 5 commits into
jpage-godaddy wants to merge 5 commits into
Conversation
**Fix: unresolved external $refs (was: 587 broken refs across 16 domains)**
The generator was silently preserving unresolved $refs on failure. Root
causes fixed:
- Local refs resolving to nodes that themselves contained $ref were never
recursively dereferenced; now a second dereference pass is applied
- Some spec repos store models at v2/models/ not v2/schemas/models/; added
sibling-directory fallback in resolve_ref()
- Spec files authored on macOS had wrong-case filenames (e.g.
Bulkingestion.yaml vs BulkIngestion.yaml); added case-insensitive path
resolution fallback
- Invalid YAML escape (\.) in double-quoted strings; added preprocessing
on parse retry
- Submodule content (transaction-models) was absent because clone used
--depth 1 without --recurse-submodules; fixed
Added collect_unresolved_refs() post-generation validator that fails the
build loudly with a full list of unresolved refs instead of silently
producing broken output.
Added CI static check in cicd.yml that greps committed schemas for
unresolved external $refs on every PR.
**Improvement: $defs deduplication (was: Metafield inlined 167×, Error 136×)**
External file refs and local #/components/schemas/ refs are now lifted into
a top-level $defs section instead of being fully inlined at every use site.
The same ref URL produces one $defs entry; all occurrences become local
{"$ref": "#/$defs/Name"} pointers. This is standard JSON Schema — any
compliant reader resolves local refs.
Result: orders.json 59,837 → 3,888 lines (-94%), bulk-operations.json
17,592 → 2,512 lines (-86%). Smaller catalog files mean a smaller binary
(schemas are include_str!'d at compile time).
Added indexmap dependency for stable $defs insertion order.
Added 8 regression tests covering all new behaviors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Rust API catalog generator to (1) fully dereference previously-broken $refs in upstream OpenAPI/JSON Schema sources and (2) reduce committed schema size by deduplicating repeated schemas into a top-level $defs with local #/$defs/... references, aligning Rust output with the TypeScript generator.
Changes:
- Fix
$refdereferencing by recursively resolving referenced nodes (including nested$refs), adding repo layout fallbacks, case-insensitive file lookup, YAML parse retry normalization, and cloning submodules. - Add
$defslifting/deduplication so repeated schemas are emitted once and referenced locally. - Add CI enforcement to prevent committed schemas from containing unresolved external
$refs.
Reviewed changes
Copilot reviewed 11 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/tools/generate-api-catalog/src/main.rs | Implements recursive $ref resolution, $defs lifting/deduplication, repo/YAML fallbacks, and adds focused unit tests. |
| rust/tools/generate-api-catalog/Cargo.toml | Adds indexmap for deterministic $defs tracking and tempfile for tests. |
| rust/Cargo.lock | Locks new dependencies (indexmap, tempfile). |
| .github/workflows/cicd.yml | Adds a CI check to fail if committed API schemas contain unresolved external $refs. |
| rust/schemas/api/taxes.json | Regenerated catalog output using $defs and local refs. |
| rust/schemas/api/shipping.json | Regenerated catalog output using $defs and local refs (major size/duplication reduction). |
| rust/schemas/api/recommendations.json | Regenerated catalog output using $defs and local refs. |
| rust/schemas/api/metafields.json | Regenerated catalog output using $defs and local refs (major size/duplication reduction). |
| rust/schemas/api/location-addresses.json | Regenerated catalog output using $defs and local refs. |
| rust/schemas/api/manifest.json | Updates generated timestamp and domain entries ordering after regeneration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Escape $defs keys as JSON Pointer segments (RFC 6901) when formatting $ref strings; keys containing ~ or / would otherwise resolve incorrectly - Handle \" inside YAML double-quoted scalars in fix_yaml_invalid_dot_escapes so an escaped quote doesn't prematurely terminate the scalar - Rewrite CI $ref check using `if grep | grep -v; then exit 1; fi` to avoid pipefail triggering on zero matches (grep exits 1 when nothing matches) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scan $defs values (not just spec) for unresolved external $refs after
dereferencing, so the generator fails early if any slip through
- Wire response schema to lifted $defs ref in process_responses: when a
response object is a {"$ref": "#/$defs/X"} pointer, populate the
schema field so api describe exposes the error shape
- Regenerate schemas: fixes customer-profiles $ref pointer escaping
(json_pointer_escape applied correctly now), and location-addresses
error responses now include schema field pointing to $defs.error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix derive_defs_key_for_path to use component name from fragment for external refs like ./common.yaml#/components/schemas/Foo, preventing key collisions when two components from the same file get the same stem - Preserve original YAML parse error in retry fallback: first error is now included in the failure message instead of being silently discarded - Add set -o pipefail to CI ref-check step for reliable pipeline failure propagation - Add test cases for fragment-based key derivation - Regenerate schemas to pick up upstream percentage $id update Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous grep -v '"#/' filter used a substring match, so a ref like "./common.yaml#/components/schemas/Foo" (which contains "#/" in its fragment) would be incorrectly excluded and an unresolved external ref would go undetected. Replace the two-grep pipeline with a single pattern that directly matches $ref values whose value starts with a non-# character, which is the correct semantics for "this is an external ref". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rust/schemas/api/*.jsonhad 587 unresolved external$refs across 16 of 20 domains, makinggodaddy api describereturn broken schema references. The TypeScript generator produced zero external refs; now the Rust generator matches.$defsdeduplication: instead of fully inlining every referenced schema at each use site, repeated schemas are lifted into a top-level$defssection and replaced with local{"$ref": "#/$defs/Name"}pointers. This is standard JSON Schema — any compliant reader resolves local refs.orders.jsonshrinks from ~60K lines to ~4K lines (−94%).Root causes fixed
$refwere never recursively dereferencedv2/models/notv2/schemas/models/; added sibling-directory fallbackBulkingestion.yamlvsBulkIngestion.yaml); added case-insensitive path fallback\.) in double-quoted strings; added preprocessing on parse retrytransaction-models) was absent because clone used--depth 1without--recurse-submodules; fixedSize reduction from
$defsdeduplicationorders.jsonbulk-operations.jsontransactions.jsonmetafields.jsonSchemas are
include_str!'d at compile time, so smaller files mean a smaller binary.Test plan
cargo test— 114 tests passcargo clippy -- -D warnings— cleancargo fmt --check— cleangrep -r '"[$]ref"' rust/schemas/api/*.json | grep -v '"#/' | wc -lreturns 0godaddy api describe orders— returns schemas with$ref: "#/$defs/..."local references (no broken file-path refs)🤖 Generated with Claude Code