Skip to content

Improving the Dockerfile#160

Open
tmikula-dev wants to merge 1 commit into
masterfrom
feature/154-improve-dockerfile
Open

Improving the Dockerfile#160
tmikula-dev wants to merge 1 commit into
masterfrom
feature/154-improve-dockerfile

Conversation

@tmikula-dev
Copy link
Copy Markdown
Collaborator

@tmikula-dev tmikula-dev commented May 25, 2026

Overview

This pull request updates the Python dependency management and CI workflow to better separate development dependencies from production dependencies. The main changes involve introducing a new requirements-dev.txt for development tools and updating both documentation and CI to use it. Additionally, the Docker build process is improved to use this separation and update some underlying libraries.

Release Notes

  • Dependency management split for local and deployed purposes
  • Improved Dockerfile to take the project requirements instead of hardcoded ones

Related

Closes #154

Summary by CodeRabbit

  • Documentation

    • Updated development setup prerequisites to include Docker for integration testing.
  • Chores

    • Separated development and production dependencies.
    • Updated build process for consistency.

Review Change Stack

@tmikula-dev tmikula-dev self-assigned this May 25, 2026
@tmikula-dev tmikula-dev added the enhancement New feature or request label May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Walkthrough

The PR splits monolithic Python dependency management into separate production and development files. requirements.txt now contains only runtime dependencies, while requirements-dev.txt explicitly lists all development/test/lint tools. The Dockerfile is updated to install from requirements.txt with an upgraded librdkafka build. The CI workflow installs from requirements-dev.txt and adjusts file change detection to include all requirements*.txt files. Developer documentation is updated accordingly.

Changes

Dependency Separation and Alignment

Layer / File(s) Summary
Dependency file structure
requirements.txt, requirements-dev.txt
requirements.txt contains only runtime dependencies (cryptography, jsonschema, PyJWT, requests, boto3, botocore, aiosql, confluent-kafka, psycopg2); requirements-dev.txt explicitly enumerates pinned development dependencies (testing, linting, formatting, type-checking, and container tools) instead of referencing requirements.txt.
Dockerfile production dependency installation
Dockerfile
Dockerfile copies requirements.txt to the Lambda task root and installs dependencies via pip install -r requirements.txt with --no-binary confluent-kafka, replacing inline pinned lists. The librdkafka build is upgraded from v2.4.0 to v2.14.0.
CI workflow dependency installation and file change detection
.github/workflows/check_python.yml
Python file change detection for PRs now includes any requirements*.txt files alongside .py files. Pylint, Black, Mypy, unit test/coverage, and integration test jobs all install from requirements-dev.txt instead of requirements.txt.
Developer setup documentation
DEVELOPER.md
Added Docker as a required prerequisite for local integration testing. Updated Python environment setup to install from requirements-dev.txt, removing previous PostgreSQL client dev package guidance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AbsaOSS/EventGate#72: Both PRs touch the CI logic for .github/workflows/check_python.yml—the retrieved PR introduces the Python Check workflow with conditional lint/test jobs, and the main PR updates that workflow's dependency installation and Python change-detection wiring.

Suggested reviewers

  • petr-pokorny-absa
  • lsulak
  • oto-macenauer-absa

Poem

🐰 Split the deps with care, production clean,
Dev tools tucked away where they should be seen,
Docker builds faster, no bloat in the image,
CI runs smooth with requirements damage,
A rabbit approves this dependency message! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and overly generic; 'Improving the Dockerfile' does not convey the main change of separating runtime and development dependencies. Revise the title to specifically mention the dependency management separation, such as 'Separate runtime and development dependencies into requirements-dev.txt'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description follows the template structure with Overview, Release Notes, and Related sections, providing clear context about the changes.
Linked Issues check ✅ Passed The pull request addresses all core requirements from issue #154: split requirements into runtime and development files, Dockerfile now uses requirements.txt via pip install, and dependencies are pinned.
Out of Scope Changes check ✅ Passed All changes directly support the objective of separating dependencies and updating the Dockerfile; changes to DEVELOPER.md and CI workflow align with the separation strategy.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/154-improve-dockerfile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tmikula-dev
Copy link
Copy Markdown
Collaborator Author

Interesting fact, that the repo of librdkafka was transferred from Magnus Edenhill to Confluent Inc. in 2023. So the Dockerfile, that updated its version also had to change the repository to fetch the version from.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/check_python.yml:
- Around line 40-42: The CHANGED_FILES assignment uses gh api to list PR files
but only gets the first page; update the gh api call inside the CHANGED_FILES
assignment to enable pagination by adding the --paginate and --slurp flags so
the existing --jq '.[].filename | select(endswith(".py") or
(startswith("requirements") and endswith(".txt")))' runs over the combined JSON
result; ensure the flags are added to the gh api invocation that constructs
CHANGED_FILES so the jq expression still extracts filenames as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2e4c4a1-be3b-4dae-a994-9d2600760b9a

📥 Commits

Reviewing files that changed from the base of the PR and between c107a5d and 459e1e0.

📒 Files selected for processing (5)
  • .github/workflows/check_python.yml
  • DEVELOPER.md
  • Dockerfile
  • requirements-dev.txt
  • requirements.txt

Comment on lines 40 to +42
CHANGED_FILES=$(gh api \
"repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \
--jq '.[].filename | select(endswith(".py") or . == "requirements.txt")')
--jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

GitHub CLI gh api pagination behavior for REST endpoint /repos/{owner}/{repo}/pulls/{pull_number}/files, and correct use of --paginate with --jq

💡 Result:

gh api pagination for REST /repos/{owner}/{repo}/pulls/{pull_number}/files 1) Pagination with --paginate Yes. For REST requests, gh api --paginate will “make additional HTTP requests to fetch all pages of results until there are no more pages” [1]. GitHub REST pagination is driven by Link headers; the REST docs explain that you can use the URLs from the Link header to fetch additional pages [2]. 2) Interaction between --paginate and --jq (key point) - Without --slurp: in --paginate mode, gh api may emit multiple JSON documents (one per page). A jq filter that assumes a single JSON value can therefore only apply to one page or behave unexpectedly. This “multiple JSON documents” behavior is called out in gh api’s guidance/discussion about using --paginate with --jq [3]. - With --slurp: gh api explicitly supports combining paginated JSON arrays/objects into a single outer JSON array (“Pass --slurp to wrap all pages of JSON arrays or objects into an outer JSON array”) [1]. That makes jq straightforward. 3) Correct way to use --paginate with --jq for PR files Use --paginate + --slurp so jq sees one combined JSON array, then extract the file fields you want. To print filenames across all pages: gh api \ --paginate \ --slurp \ /repos/OWNER/REPO/pulls/PR_NUMBER/files \ --jq '.[].filename' This combines gh api’s documented behavior for --paginate and --slurp [1] with the documented purpose of --jq for jq-compatible field extraction [3][4]. 4) If you don’t use --slurp You must either: - run jq in a way that can handle multiple JSON documents from stdout (e.g., by using jq “streaming” modes yourself), or - avoid filtering and instead post-process the multi-page output. In practice, for shell scripting the recommended approach is still --slurp when you want jq to treat the result as one JSON value [1]. 5) Practical gotcha If you omit --paginate, you’ll only get the first page of /pulls/{pull_number}/files (so large PRs won’t list all changed files). Use --paginate to fetch subsequent pages [1].

Citations:


Handle paginated PR file lists in .github/workflows/check_python.yml (lines 40-42).
gh api for pulls/{pull_number}/files only returns the first page unless pagination is enabled; additionally, when using --paginate, add --slurp so the existing --jq '.[].filename ...' runs over a single combined JSON value.

💡 Suggested fix
-            CHANGED_FILES=$(gh api \
-              "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \
-              --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
+            CHANGED_FILES=$(gh api --paginate --slurp \
+              "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \
+              --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHANGED_FILES=$(gh api \
"repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \
--jq '.[].filename | select(endswith(".py") or . == "requirements.txt")')
--jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
CHANGED_FILES=$(gh api --paginate --slurp \
"repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \
--jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/check_python.yml around lines 40 - 42, The CHANGED_FILES
assignment uses gh api to list PR files but only gets the first page; update the
gh api call inside the CHANGED_FILES assignment to enable pagination by adding
the --paginate and --slurp flags so the existing --jq '.[].filename |
select(endswith(".py") or (startswith("requirements") and endswith(".txt")))'
runs over the combined JSON result; ensure the flags are added to the gh api
invocation that constructs CHANGED_FILES so the jq expression still extracts
filenames as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Dockerfile: dependencies are installed within with hard-coded versions

1 participant