-
Notifications
You must be signed in to change notification settings - Fork 124
RHAIENG-1759: fix(tests): increase timeout for s390x in Kubernetes port-forward tests #2704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Increase connection timeout from 30s to 60s on s390x architecture to account for slower SSL/TLS negotiation during port forwarding. Resolves: RHAIENG-1759
WalkthroughAdds architecture-aware timeout logic to Kubernetes test utility: detects s390x images and sets connection timeout to 60s, otherwise 30s, and passes this timeout into the HTTP-based pod readiness Wait.until check. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ayush17. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/containers/kubernetes_utils.py (1)
273-273: Consider renaming to avoid variable shadowing.The local variable
timeoutshadows the method parametertimeoutfrom line 148. While this works correctly since they serve different purposes (pod readiness vs. HTTP connection), using a more specific name likeconnection_timeoutwould improve code clarity.Apply this diff to improve clarity:
# Increase timeout for s390x architecture due to slower SSL/TLS negotiation - timeout = 60 if platform.machine() == "s390x" else 30 + connection_timeout = 60 if platform.machine() == "s390x" else 30 Wait.until( "Connecting to pod succeeds", 1, - timeout, + connection_timeout, lambda: requests.get(f"http://localhost:{self.port}").status_code == 200, )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/containers/kubernetes_utils.py(2 hunks)
🔇 Additional comments (2)
tests/containers/kubernetes_utils.py (2)
6-6: LGTM!The
platformimport is correctly placed and necessary for architecture detection.
274-279: LGTM! Architecture-specific timeout correctly applied.The timeout is properly applied to the HTTP connection check after port-forwarding is established. This targets the specific issue mentioned in the PR description while keeping the pod deployment timeout (line 253) unchanged.
tests/containers/kubernetes_utils.py
Outdated
| self.port = p.get_actual_port() | ||
| LOGGER.debug(f"Listening on port {self.port}") | ||
| # Increase timeout for s390x architecture due to slower SSL/TLS negotiation | ||
| timeout = 60 if platform.machine() == "s390x" else 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify platform detection on s390x and consider using constants.
The architecture detection and timeout values look reasonable, but please verify that platform.machine() returns exactly "s390x" on the target architecture. Additionally, consider defining these timeout values as constants in TestFrameConstants for better maintainability.
Run the following script to verify the platform detection approach and check for any existing s390x-related configuration:
🏁 Script executed:
#!/bin/bash
# Description: Search for existing s390x platform checks and timeout configurations
# Search for other s390x references in the codebase
echo "=== Existing s390x references ==="
rg -n "s390x" --type py
# Search for platform.machine() usage patterns
echo -e "\n=== platform.machine() usage ==="
rg -n "platform\.machine\(\)" --type py
# Search for other architecture-specific logic
echo -e "\n=== Other architecture checks ==="
rg -n "platform\.(system|processor|architecture)" --type pyLength of output: 3747
I need to gather more information to verify the review concerns. Let me search for TestFrameConstants and check platform.machine() behavior on s390x.
Let me search for TestFrameConstants and timeout constants in the codebase.
🏁 Script executed:
#!/bin/bash
# Search for TestFrameConstants, test framework constants, and timeout constants
echo "=== Search for TestFrameConstants ==="
rg -rn "TestFrameConstants" --type py
echo -e "\n=== Search for timeout constants/definitions ==="
rg -rn "TIMEOUT|_TIMEOUT" --type py | head -30
echo -e "\n=== Check test file structure for constants ==="
rg -rn "^[A-Z_]+\s*=" tests/containers/kubernetes_utils.py | head -20Length of output: 2146
Let me inspect the kubernetes_utils.py file to see the TestFrameConstants class and the context around line 273.
🏁 Script executed:
#!/bin/bash
# Get context around line 273 and find TestFrameConstants class definition
echo "=== Context around line 273 ==="
sed -n '265,280p' tests/containers/kubernetes_utils.py
echo -e "\n=== TestFrameConstants class definition ==="
rg -A 30 "^class TestFrameConstants" tests/containers/kubernetes_utils.pyLength of output: 1864
Based on my verification, the review comment identifies valid concerns. Let me generate the rewritten comment:
Add .lower() to platform comparison and extract timeouts as constants.
The platform detection comparison should use .lower() for consistency with other architecture checks in the codebase (e.g., gha_pr_changed_files.py:131). Additionally, extract the timeout values (60 and 30 seconds) as constants in TestFrameConstants alongside existing timeout definitions (TIMEOUT_2MIN, TIMEOUT_5MIN, etc.) for better maintainability.
Changes needed in tests/containers/kubernetes_utils.py:273:
- Use
platform.machine().lower() == "s390x"instead of direct comparison - Define
TIMEOUT_S390X_SSL = 60andTIMEOUT_DEFAULT_SSL = 30inTestFrameConstantsclass, then reference them
🤖 Prompt for AI Agents
In tests/containers/kubernetes_utils.py around line 273, replace the direct
platform.machine() comparison with a case-insensitive check using
platform.machine().lower() == "s390x", and stop using hardcoded timeout
literals; instead add two constants to the TestFrameConstants class
(TIMEOUT_S390X_SSL = 60 and TIMEOUT_DEFAULT_SSL = 30) and reference those
constants here so the timeout selection reads something like TIMEOUT_S390X_SSL
if platform.machine().lower() == "s390x" else TIMEOUT_DEFAULT_SSL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai changes were made, recheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiridanek, thank you for the update! I've reviewed the changes and I can see you've taken a better approach than originally suggested.
What improved: ✅
- You're now checking
"s390x" in self.image.lower()instead ofplatform.machine() - This is actually more appropriate since the comment explains tests run on x86_64 runners with emulated/cross-compiled s390x images
- The
.lower()method is correctly used for case-insensitive comparison
What still needs attention:
- The timeout values (60 and 30 seconds) are still hardcoded on line 274
- These should be extracted as constants in the
TestFrameConstantsclass for better maintainability
Suggested addition to TestFrameConstants class:
class TestFrameConstants:
GLOBAL_POLL_INTERVAL_MEDIUM = 10
TIMEOUT_2MIN = 2 * 60
TIMEOUT_5MIN = 5 * 60
TIMEOUT_20MIN = 20 * 60
# Connection timeout for SSL/TLS negotiation
TIMEOUT_S390X_CONNECTION = 60 # s390x requires longer timeout
TIMEOUT_DEFAULT_CONNECTION = 30
# this includes potentially pulling the image, and cuda images are huge
READINESS_TIMEOUT = TIMEOUT_5MINThen update line 274 to:
connection_timeout = TestFrameConstants.TIMEOUT_S390X_CONNECTION if is_s390x else TestFrameConstants.TIMEOUT_DEFAULT_CONNECTIONThe image-name-based detection is a smart solution for your testing environment! 👍
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jiridanek The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
|
lgtm too! Thanks @ayush17 |
|
@ayush17 here's one way of complying with andriana's request: go to your fork of the repo, https://github.com/ayush17/notebooks/actions, and enable running github actions on your fork, then trigger the build-on-push workflow (I'm happy checking how it went after merging to main, but if andriana wants to check first then merge, I respect that) |
|
New changes are detected. LGTM label has been removed. |
I attempted to test on my fork as requested, but the build/test workflows aren't running there - only validation workflows executed, which failed due to missing upstream image metadata (expected for forks). My fork at https://github.com/ayush17/notebooks/actions shows only 3 lightweight workflows:
The actual ""Build Notebooks"" workflows that would test on s390x didn't trigger on my fork. Could you approve the workflow run on this PR so the tests can execute on the upstream infrastructure? The change is minimal and isolated: This will allow proper testing on actual s390x hardware. Thank you! |
|
/ok-to-test |
The previous fix checked platform.machine() which returns the runner's architecture (x86_64), not the target image architecture. Since tests run on x86_64 runners with cross-compiled/emulated s390x images, we need to check the image name string for 's390x' instead. This ensures the 60s timeout is applied when testing s390x images, regardless of the runner architecture.
|
@ayush17: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@jiridanek @atheo89 |
|
The Openshift tests are obsolete, so don't pay attention to those as they are un-maintain long time now. |
Increase connection timeout from 30s to 60s on s390x architecture
to account for slower SSL/TLS negotiation during port forwarding.
Resolves: RHAIENG-1759
Description
How Has This Been Tested?
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.