Skip to content

Conversation

@ayush17
Copy link

@ayush17 ayush17 commented Nov 24, 2025

Increase connection timeout from 30s to 60s on s390x architecture
to account for slower SSL/TLS negotiation during port forwarding.

Resolves: RHAIENG-1759

Description

How Has This Been Tested?

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Tests
    • Improved pod readiness checks to be more reliable across platforms by using architecture-aware timeouts (longer timeout for s390x) and adjusted polling behavior. This reduces false failures and makes integration tests more robust on slower or architecture-specific environments.

✏️ Tip: You can customize this high-level summary in your review settings.

   Increase connection timeout from 30s to 60s on s390x architecture
   to account for slower SSL/TLS negotiation during port forwarding.

   Resolves: RHAIENG-1759
@openshift-ci openshift-ci bot requested review from dibryant and jiridanek November 24, 2025 06:13
@github-actions github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Nov 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Adds architecture-aware timeout logic to Kubernetes test utility: detects s390x images and sets connection timeout to 60s, otherwise 30s, and passes this timeout into the HTTP-based pod readiness Wait.until check.

Changes

Cohort / File(s) Summary
Architecture-specific timeout handling
tests/containers/kubernetes_utils.py
Adds platform detection and conditional connection timeout: 60 seconds when image name contains s390x, 30 seconds otherwise; supplies the timeout to the Wait.until HTTP readiness poll on localhost.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify detection of s390x in the image name is reliable and matches expected naming conventions.
  • Confirm the timeout value is correctly passed into the Wait.until call and used by the HTTP readiness check.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; the 'Description' and 'How Has This Been Tested?' sections lack detail, and all self-checklist items remain unchecked despite being marked as required. Fill in the 'Description' section with detailed explanation of changes, complete the 'How Has This Been Tested?' section with testing details, and check off applicable self-checklist items to confirm compliance.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: increasing timeout for s390x in Kubernetes port-forward tests, which directly matches the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3003ca and 3696f7d.

📒 Files selected for processing (1)
  • tests/containers/kubernetes_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/containers/kubernetes_utils.py

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/xs label Nov 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

Hi @ayush17. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/containers/kubernetes_utils.py (1)

273-273: Consider renaming to avoid variable shadowing.

The local variable timeout shadows the method parameter timeout from line 148. While this works correctly since they serve different purposes (pod readiness vs. HTTP connection), using a more specific name like connection_timeout would improve code clarity.

Apply this diff to improve clarity:

             # Increase timeout for s390x architecture due to slower SSL/TLS negotiation
-            timeout = 60 if platform.machine() == "s390x" else 30
+            connection_timeout = 60 if platform.machine() == "s390x" else 30
             Wait.until(
                 "Connecting to pod succeeds",
                 1,
-                timeout,
+                connection_timeout,
                 lambda: requests.get(f"http://localhost:{self.port}").status_code == 200,
             )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb65049 and e3003ca.

📒 Files selected for processing (1)
  • tests/containers/kubernetes_utils.py (2 hunks)
🔇 Additional comments (2)
tests/containers/kubernetes_utils.py (2)

6-6: LGTM!

The platform import is correctly placed and necessary for architecture detection.


274-279: LGTM! Architecture-specific timeout correctly applied.

The timeout is properly applied to the HTTP connection check after port-forwarding is established. This targets the specific issue mentioned in the PR description while keeping the pod deployment timeout (line 253) unchanged.

self.port = p.get_actual_port()
LOGGER.debug(f"Listening on port {self.port}")
# Increase timeout for s390x architecture due to slower SSL/TLS negotiation
timeout = 60 if platform.machine() == "s390x" else 30
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify platform detection on s390x and consider using constants.

The architecture detection and timeout values look reasonable, but please verify that platform.machine() returns exactly "s390x" on the target architecture. Additionally, consider defining these timeout values as constants in TestFrameConstants for better maintainability.

Run the following script to verify the platform detection approach and check for any existing s390x-related configuration:


🏁 Script executed:

#!/bin/bash
# Description: Search for existing s390x platform checks and timeout configurations

# Search for other s390x references in the codebase
echo "=== Existing s390x references ==="
rg -n "s390x" --type py

# Search for platform.machine() usage patterns
echo -e "\n=== platform.machine() usage ==="
rg -n "platform\.machine\(\)" --type py

# Search for other architecture-specific logic
echo -e "\n=== Other architecture checks ==="
rg -n "platform\.(system|processor|architecture)" --type py

Length of output: 3747


I need to gather more information to verify the review concerns. Let me search for TestFrameConstants and check platform.machine() behavior on s390x.

Let me search for TestFrameConstants and timeout constants in the codebase.


🏁 Script executed:

#!/bin/bash
# Search for TestFrameConstants, test framework constants, and timeout constants

echo "=== Search for TestFrameConstants ==="
rg -rn "TestFrameConstants" --type py

echo -e "\n=== Search for timeout constants/definitions ==="
rg -rn "TIMEOUT|_TIMEOUT" --type py | head -30

echo -e "\n=== Check test file structure for constants ==="
rg -rn "^[A-Z_]+\s*=" tests/containers/kubernetes_utils.py | head -20

Length of output: 2146


Let me inspect the kubernetes_utils.py file to see the TestFrameConstants class and the context around line 273.


🏁 Script executed:

#!/bin/bash
# Get context around line 273 and find TestFrameConstants class definition

echo "=== Context around line 273 ==="
sed -n '265,280p' tests/containers/kubernetes_utils.py

echo -e "\n=== TestFrameConstants class definition ==="
rg -A 30 "^class TestFrameConstants" tests/containers/kubernetes_utils.py

Length of output: 1864


Based on my verification, the review comment identifies valid concerns. Let me generate the rewritten comment:


Add .lower() to platform comparison and extract timeouts as constants.

The platform detection comparison should use .lower() for consistency with other architecture checks in the codebase (e.g., gha_pr_changed_files.py:131). Additionally, extract the timeout values (60 and 30 seconds) as constants in TestFrameConstants alongside existing timeout definitions (TIMEOUT_2MIN, TIMEOUT_5MIN, etc.) for better maintainability.

Changes needed in tests/containers/kubernetes_utils.py:273:

  • Use platform.machine().lower() == "s390x" instead of direct comparison
  • Define TIMEOUT_S390X_SSL = 60 and TIMEOUT_DEFAULT_SSL = 30 in TestFrameConstants class, then reference them
🤖 Prompt for AI Agents
In tests/containers/kubernetes_utils.py around line 273, replace the direct
platform.machine() comparison with a case-insensitive check using
platform.machine().lower() == "s390x", and stop using hardcoded timeout
literals; instead add two constants to the TestFrameConstants class
(TIMEOUT_S390X_SSL = 60 and TIMEOUT_DEFAULT_SSL = 30) and reference those
constants here so the timeout selection reads something like TIMEOUT_S390X_SSL
if platform.machine().lower() == "s390x" else TIMEOUT_DEFAULT_SSL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai changes were made, recheck

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, thank you for the update! I've reviewed the changes and I can see you've taken a better approach than originally suggested.

What improved:

  • You're now checking "s390x" in self.image.lower() instead of platform.machine()
  • This is actually more appropriate since the comment explains tests run on x86_64 runners with emulated/cross-compiled s390x images
  • The .lower() method is correctly used for case-insensitive comparison

What still needs attention:

  • The timeout values (60 and 30 seconds) are still hardcoded on line 274
  • These should be extracted as constants in the TestFrameConstants class for better maintainability

Suggested addition to TestFrameConstants class:

class TestFrameConstants:
    GLOBAL_POLL_INTERVAL_MEDIUM = 10
    TIMEOUT_2MIN = 2 * 60
    TIMEOUT_5MIN = 5 * 60
    TIMEOUT_20MIN = 20 * 60
    
    # Connection timeout for SSL/TLS negotiation
    TIMEOUT_S390X_CONNECTION = 60  # s390x requires longer timeout
    TIMEOUT_DEFAULT_CONNECTION = 30
    
    # this includes potentially pulling the image, and cuda images are huge
    READINESS_TIMEOUT = TIMEOUT_5MIN

Then update line 274 to:

connection_timeout = TestFrameConstants.TIMEOUT_S390X_CONNECTION if is_s390x else TestFrameConstants.TIMEOUT_DEFAULT_CONNECTION

The image-name-based detection is a smart solution for your testing environment! 👍

@jiridanek jiridanek changed the title fix(tests): Increase timeout for s390x in Kubernetes port-forward tests RHAIENG-1759: fix(tests): Increase timeout for s390x in Kubernetes port-forward tests Nov 24, 2025
@jiridanek jiridanek changed the title RHAIENG-1759: fix(tests): Increase timeout for s390x in Kubernetes port-forward tests RHAIENG-1759: fix(tests): increase timeout for s390x in Kubernetes port-forward tests Nov 24, 2025
@openshift-ci openshift-ci bot added size/xs and removed size/xs labels Nov 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiridanek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek
Copy link
Member

@ayush17 looks good to me; please consider the coderabbitai suggestions, but I don't insist on that; and merge when ready

cc @atheo89

@jiridanek
Copy link
Member

/ok-to-test

@jiridanek
Copy link
Member

@ayush17 if you don't have permissions to merge, talk to @ysok and he'll tell you what's necessary to do

@atheo89
Copy link
Member

atheo89 commented Nov 24, 2025

lgtm too! Thanks @ayush17
However, I would like to see a build image based s390x platform to be triggered and validate if the timeout is proper.

@jiridanek
Copy link
Member

@ayush17 here's one way of complying with andriana's request: go to your fork of the repo, https://github.com/ayush17/notebooks/actions, and enable running github actions on your fork, then trigger the build-on-push workflow

(I'm happy checking how it went after merging to main, but if andriana wants to check first then merge, I respect that)

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD fb65049 and 2 for PR HEAD e3003ca in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 6c5ecb1 and 1 for PR HEAD e3003ca in total

@openshift-ci openshift-ci bot removed the lgtm label Nov 25, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 25, 2025

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot added size/xs and removed size/xs labels Nov 25, 2025
@ayush17
Copy link
Author

ayush17 commented Nov 25, 2025

@ayush17 here's one way of complying with andriana's request: go to your fork of the repo, https://github.com/ayush17/notebooks/actions, and enable running github actions on your fork, then trigger the build-on-push workflow

(I'm happy checking how it went after merging to main, but if andriana wants to check first then merge, I respect that)

@atheo89 @jiridanek

I attempted to test on my fork as requested, but the build/test workflows aren't running there - only validation workflows executed, which failed due to missing upstream image metadata (expected for forks).

My fork at https://github.com/ayush17/notebooks/actions shows only 3 lightweight workflows:

  • ❌ Validation (failed - missing upstream image SHAs)
  • ✅ Code static analysis (passed)
  • ✅ Docs (passed)

The actual ""Build Notebooks"" workflows that would test on s390x didn't trigger on my fork.

Could you approve the workflow run on this PR so the tests can execute on the upstream infrastructure? The change is minimal and isolated:

This will allow proper testing on actual s390x hardware.

Thank you!

@jiridanek
Copy link
Member

/ok-to-test

The previous fix checked platform.machine() which returns the runner's
architecture (x86_64), not the target image architecture. Since tests
run on x86_64 runners with cross-compiled/emulated s390x images, we
need to check the image name string for 's390x' instead.

This ensures the 60s timeout is applied when testing s390x images,
regardless of the runner architecture.
@openshift-ci openshift-ci bot added size/xs and removed size/xs labels Nov 25, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 25, 2025

@ayush17: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 3696f7d link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ayush17
Copy link
Author

ayush17 commented Nov 25, 2025

@jiridanek @atheo89
Ready for testing on upstream!
Builds succeeded on my fork, but OpenShift tests need upstream infrastructure.

@atheo89
Copy link
Member

atheo89 commented Nov 26, 2025

The Openshift tests are obsolete, so don't pay attention to those as they are un-maintain long time now.
However, I am not sure why still grab 30 sec when you build upon s390x and eventually fails:
Check out this run: https://github.com/ayush17/notebooks/actions/runs/19682568042/job/56379996483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ok-to-test review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/xs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants