Improving the Dockerfile#160
Conversation
WalkthroughThe PR splits monolithic Python dependency management into separate production and development files. ChangesDependency Separation and Alignment
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Interesting fact, that the repo of librdkafka was transferred from Magnus Edenhill to Confluent Inc. in 2023. So the Dockerfile, that updated its version also had to change the repository to fetch the version from. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/check_python.yml:
- Around line 40-42: The CHANGED_FILES assignment uses gh api to list PR files
but only gets the first page; update the gh api call inside the CHANGED_FILES
assignment to enable pagination by adding the --paginate and --slurp flags so
the existing --jq '.[].filename | select(endswith(".py") or
(startswith("requirements") and endswith(".txt")))' runs over the combined JSON
result; ensure the flags are added to the gh api invocation that constructs
CHANGED_FILES so the jq expression still extracts filenames as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b2e4c4a1-be3b-4dae-a994-9d2600760b9a
📒 Files selected for processing (5)
.github/workflows/check_python.ymlDEVELOPER.mdDockerfilerequirements-dev.txtrequirements.txt
| CHANGED_FILES=$(gh api \ | ||
| "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \ | ||
| --jq '.[].filename | select(endswith(".py") or . == "requirements.txt")') | ||
| --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))') |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub CLI gh api pagination behavior for REST endpoint /repos/{owner}/{repo}/pulls/{pull_number}/files, and correct use of --paginate with --jq
💡 Result:
gh api pagination for REST /repos/{owner}/{repo}/pulls/{pull_number}/files 1) Pagination with --paginate Yes. For REST requests, gh api --paginate will “make additional HTTP requests to fetch all pages of results until there are no more pages” [1]. GitHub REST pagination is driven by Link headers; the REST docs explain that you can use the URLs from the Link header to fetch additional pages [2]. 2) Interaction between --paginate and --jq (key point) - Without --slurp: in --paginate mode, gh api may emit multiple JSON documents (one per page). A jq filter that assumes a single JSON value can therefore only apply to one page or behave unexpectedly. This “multiple JSON documents” behavior is called out in gh api’s guidance/discussion about using --paginate with --jq [3]. - With --slurp: gh api explicitly supports combining paginated JSON arrays/objects into a single outer JSON array (“Pass --slurp to wrap all pages of JSON arrays or objects into an outer JSON array”) [1]. That makes jq straightforward. 3) Correct way to use --paginate with --jq for PR files Use --paginate + --slurp so jq sees one combined JSON array, then extract the file fields you want. To print filenames across all pages: gh api \ --paginate \ --slurp \ /repos/OWNER/REPO/pulls/PR_NUMBER/files \ --jq '.[].filename' This combines gh api’s documented behavior for --paginate and --slurp [1] with the documented purpose of --jq for jq-compatible field extraction [3][4]. 4) If you don’t use --slurp You must either: - run jq in a way that can handle multiple JSON documents from stdout (e.g., by using jq “streaming” modes yourself), or - avoid filtering and instead post-process the multi-page output. In practice, for shell scripting the recommended approach is still --slurp when you want jq to treat the result as one JSON value [1]. 5) Practical gotcha If you omit --paginate, you’ll only get the first page of /pulls/{pull_number}/files (so large PRs won’t list all changed files). Use --paginate to fetch subsequent pages [1].
Citations:
- 1: https://cli.github.com/manual/gh_api
- 2: https://github.com/github/docs/blob/main/content/rest/using-the-rest-api/using-pagination-in-the-rest-api.md
- 3: Listing all repositories using `api` without manual pagination cli/cli#3257
- 4: https://github.blog/engineering/engineering-principles/scripting-with-github-cli/
Handle paginated PR file lists in .github/workflows/check_python.yml (lines 40-42).
gh api for pulls/{pull_number}/files only returns the first page unless pagination is enabled; additionally, when using --paginate, add --slurp so the existing --jq '.[].filename ...' runs over a single combined JSON value.
💡 Suggested fix
- CHANGED_FILES=$(gh api \
- "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \
- --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
+ CHANGED_FILES=$(gh api --paginate --slurp \
+ "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \
+ --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CHANGED_FILES=$(gh api \ | |
| "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \ | |
| --jq '.[].filename | select(endswith(".py") or . == "requirements.txt")') | |
| --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))') | |
| CHANGED_FILES=$(gh api --paginate --slurp \ | |
| "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \ | |
| --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))') |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/check_python.yml around lines 40 - 42, The CHANGED_FILES
assignment uses gh api to list PR files but only gets the first page; update the
gh api call inside the CHANGED_FILES assignment to enable pagination by adding
the --paginate and --slurp flags so the existing --jq '.[].filename |
select(endswith(".py") or (startswith("requirements") and endswith(".txt")))'
runs over the combined JSON result; ensure the flags are added to the gh api
invocation that constructs CHANGED_FILES so the jq expression still extracts
filenames as before.
Overview
This pull request updates the Python dependency management and CI workflow to better separate development dependencies from production dependencies. The main changes involve introducing a new
requirements-dev.txtfor development tools and updating both documentation and CI to use it. Additionally, the Docker build process is improved to use this separation and update some underlying libraries.Release Notes
Related
Closes #154
Summary by CodeRabbit
Documentation
Chores