Improving the Dockerfile by tmikula-dev · Pull Request #160 · AbsaOSS/EventGate

tmikula-dev · 2026-05-25T14:37:25Z

Overview

This pull request updates the Python dependency management and CI workflow to better separate development dependencies from production dependencies. The main changes involve introducing a new requirements-dev.txt for development tools and updating both documentation and CI to use it. Additionally, the Docker build process is improved to use this separation and update some underlying libraries.

Release Notes

Dependency management split for local and deployed purposes
Improved Dockerfile to take the project requirements instead of hardcoded ones

Summary by CodeRabbit

Documentation
- Updated development setup prerequisites to include Docker for integration testing.
Chores
- Separated development and production dependencies.
- Updated build process for consistency.

coderabbitai · 2026-05-25T14:37:35Z

Walkthrough

The PR splits monolithic Python dependency management into separate production and development files. requirements.txt now contains only runtime dependencies, while requirements-dev.txt explicitly lists all development/test/lint tools. The Dockerfile is updated to install from requirements.txt with an upgraded librdkafka build. The CI workflow installs from requirements-dev.txt and adjusts file change detection to include all requirements*.txt files. Developer documentation is updated accordingly.

Changes

Dependency Separation and Alignment

Layer / File(s)	Summary
Dependency file structure `requirements.txt`, `requirements-dev.txt`	`requirements.txt` contains only runtime dependencies (cryptography, jsonschema, PyJWT, requests, boto3, botocore, aiosql, confluent-kafka, psycopg2); `requirements-dev.txt` explicitly enumerates pinned development dependencies (testing, linting, formatting, type-checking, and container tools) instead of referencing requirements.txt.
Dockerfile production dependency installation `Dockerfile`	Dockerfile copies `requirements.txt` to the Lambda task root and installs dependencies via `pip install -r requirements.txt` with `--no-binary confluent-kafka`, replacing inline pinned lists. The `librdkafka` build is upgraded from v2.4.0 to v2.14.0.
CI workflow dependency installation and file change detection `.github/workflows/check_python.yml`	Python file change detection for PRs now includes any `requirements*.txt` files alongside `.py` files. Pylint, Black, Mypy, unit test/coverage, and integration test jobs all install from `requirements-dev.txt` instead of `requirements.txt`.
Developer setup documentation `DEVELOPER.md`	Added Docker as a required prerequisite for local integration testing. Updated Python environment setup to install from `requirements-dev.txt`, removing previous PostgreSQL client dev package guidance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

AbsaOSS/EventGate#72: Both PRs touch the CI logic for .github/workflows/check_python.yml—the retrieved PR introduces the Python Check workflow with conditional lint/test jobs, and the main PR updates that workflow's dependency installation and Python change-detection wiring.

Suggested reviewers

petr-pokorny-absa
lsulak
oto-macenauer-absa

Poem

🐰 Split the deps with care, production clean,
Dev tools tucked away where they should be seen,
Docker builds faster, no bloat in the image,
CI runs smooth with requirements damage,
A rabbit approves this dependency message! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and overly generic; 'Improving the Dockerfile' does not convey the main change of separating runtime and development dependencies.	Revise the title to specifically mention the dependency management separation, such as 'Separate runtime and development dependencies into requirements-dev.txt'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The pull request description follows the template structure with Overview, Release Notes, and Related sections, providing clear context about the changes.
Linked Issues check	✅ Passed	The pull request addresses all core requirements from issue `#154`: split requirements into runtime and development files, Dockerfile now uses requirements.txt via pip install, and dependencies are pinned.
Out of Scope Changes check	✅ Passed	All changes directly support the objective of separating dependencies and updating the Dockerfile; changes to DEVELOPER.md and CI workflow align with the separation strategy.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/154-improve-dockerfile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tmikula-dev · 2026-05-25T14:41:18Z

Interesting fact, that the repo of librdkafka was transferred from Magnus Edenhill to Confluent Inc. in 2023. So the Dockerfile, that updated its version also had to change the repository to fetch the version from.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/check_python.yml:
- Around line 40-42: The CHANGED_FILES assignment uses gh api to list PR files
but only gets the first page; update the gh api call inside the CHANGED_FILES
assignment to enable pagination by adding the --paginate and --slurp flags so
the existing --jq '.[].filename | select(endswith(".py") or
(startswith("requirements") and endswith(".txt")))' runs over the combined JSON
result; ensure the flags are added to the gh api invocation that constructs
CHANGED_FILES so the jq expression still extracts filenames as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2e4c4a1-be3b-4dae-a994-9d2600760b9a

📥 Commits

Reviewing files that changed from the base of the PR and between c107a5d and 459e1e0.

📒 Files selected for processing (5)

.github/workflows/check_python.yml
DEVELOPER.md
Dockerfile
requirements-dev.txt
requirements.txt

coderabbitai · 2026-05-25T14:44:12Z

            CHANGED_FILES=$(gh api \
              "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \
-              --jq '.[].filename | select(endswith(".py") or . == "requirements.txt")')
+              --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

GitHub CLI gh api pagination behavior for REST endpoint /repos/{owner}/{repo}/pulls/{pull_number}/files, and correct use of --paginate with --jq

💡 Result:

gh api pagination for REST /repos/{owner}/{repo}/pulls/{pull_number}/files 1) Pagination with --paginate Yes. For REST requests, gh api --paginate will “make additional HTTP requests to fetch all pages of results until there are no more pages” [1]. GitHub REST pagination is driven by Link headers; the REST docs explain that you can use the URLs from the Link header to fetch additional pages [2]. 2) Interaction between --paginate and --jq (key point) - Without --slurp: in --paginate mode, gh api may emit multiple JSON documents (one per page). A jq filter that assumes a single JSON value can therefore only apply to one page or behave unexpectedly. This “multiple JSON documents” behavior is called out in gh api’s guidance/discussion about using --paginate with --jq [3]. - With --slurp: gh api explicitly supports combining paginated JSON arrays/objects into a single outer JSON array (“Pass --slurp to wrap all pages of JSON arrays or objects into an outer JSON array”) [1]. That makes jq straightforward. 3) Correct way to use --paginate with --jq for PR files Use --paginate + --slurp so jq sees one combined JSON array, then extract the file fields you want. To print filenames across all pages: gh api \ --paginate \ --slurp \ /repos/OWNER/REPO/pulls/PR_NUMBER/files \ --jq '.[].filename' This combines gh api’s documented behavior for --paginate and --slurp [1] with the documented purpose of --jq for jq-compatible field extraction [3][4]. 4) If you don’t use --slurp You must either: - run jq in a way that can handle multiple JSON documents from stdout (e.g., by using jq “streaming” modes yourself), or - avoid filtering and instead post-process the multi-page output. In practice, for shell scripting the recommended approach is still --slurp when you want jq to treat the result as one JSON value [1]. 5) Practical gotcha If you omit --paginate, you’ll only get the first page of /pulls/{pull_number}/files (so large PRs won’t list all changed files). Use --paginate to fetch subsequent pages [1].

Citations:

1: https://cli.github.com/manual/gh_api

2: https://github.com/github/docs/blob/main/content/rest/using-the-rest-api/using-pagination-in-the-rest-api.md

3: Listing all repositories using `api` without manual pagination cli/cli#3257

4: https://github.blog/engineering/engineering-principles/scripting-with-github-cli/

Handle paginated PR file lists in .github/workflows/check_python.yml (lines 40-42).
gh api for pulls/{pull_number}/files only returns the first page unless pagination is enabled; additionally, when using --paginate, add --slurp so the existing --jq '.[].filename ...' runs over a single combined JSON value.

💡 Suggested fix

- CHANGED_FILES=$(gh api \ - "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \ - --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))') + CHANGED_FILES=$(gh api --paginate --slurp \ + "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \ + --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CHANGED_FILES=$(gh api \

"repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" \

--jq '.[].filename | select(endswith(".py") or . == "requirements.txt")')

--jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')

CHANGED_FILES=$(gh api --paginate --slurp \

"repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files?per_page=100" \

--jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))')

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/check_python.yml around lines 40 - 42, The CHANGED_FILES assignment uses gh api to list PR files but only gets the first page; update the gh api call inside the CHANGED_FILES assignment to enable pagination by adding the --paginate and --slurp flags so the existing --jq '.[].filename | select(endswith(".py") or (startswith("requirements") and endswith(".txt")))' runs over the combined JSON result; ensure the flags are added to the gh api invocation that constructs CHANGED_FILES so the jq expression still extracts filenames as before.

Improving the Dockerfile for project

459e1e0

tmikula-dev self-assigned this May 25, 2026

tmikula-dev requested a review from oto-macenauer-absa as a code owner May 25, 2026 14:37

tmikula-dev added the enhancement New feature or request label May 25, 2026

tmikula-dev requested review from lsulak and petr-pokorny-absa as code owners May 25, 2026 14:37

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the Dockerfile#160

Improving the Dockerfile#160
tmikula-dev wants to merge 1 commit into
masterfrom
feature/154-improve-dockerfile

tmikula-dev commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

tmikula-dev commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmikula-dev commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Release Notes

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

tmikula-dev commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmikula-dev commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading