Skip to content

Conversation

@Prateekbala
Copy link

Description

This PR implements automatic Kubeflow SDK integration for KFP components as requested in #12027. It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

Changes Made

1. Added Kubeflow Extras to setup.py

  • Added kubeflow = ['kubeflow'] extras option
  • Included kubeflow in the 'all' extras bundle
  • Allows users to install with pip install kfp[kubeflow] or pip install kfp[all]

2. Implemented AST-based Auto-detection

  • Added _detect_kubeflow_imports_in_function() in component_factory.py
  • Uses Abstract Syntax Tree parsing (ast + inspect + textwrap.dedent) to detect kubeflow imports in component functions
  • Supports multiple import patterns:
    • import kubeflow
    • import kubeflow.<submodule>
    • from kubeflow import <symbol>

3. Automatic Package Installation

  • Modified _get_packages_to_install_command(...) to auto-detect kubeflow usage
  • Automatically adds 'kubeflow' to packages_to_install when detected and not already specified
  • Respects explicit user-provided packages and version pins (no duplication)
  • Recognizes kubeflow when specified via VCS URLs
  • Fails closed: if source cannot be inspected or parsed, no auto-add occurs

4. Opt-out Control per Component

  • Extended @dsl.component with install_kubeflow_package: bool = True
    • True (default): auto-add kubeflow if user code imports it
    • False: never auto-add kubeflow for that component

5. Comprehensive Test Coverage

  • Added test cases covering:
    • All supported kubeflow import patterns and negative cases
    • Behavior when source inspection fails or syntax is invalid
    • Package parsing (versions, extras, VCS URLs) and duplicate avoidance
    • Decorator integration ensuring kubeflow is only installed when needed

Usage

Install via extras:

pip install kfp[kubeflow]
pip install kfp[all]

Default auto-detection (no user change needed):

@dsl.component
def my_comp(...):
    import kubeflow
    ...
# Kubeflow SDK is added to packages_to_install automatically

Opt-out:

@dsl.component(install_kubeflow_package=False)
def my_comp(...):
    import kubeflow
    ...
# Kubeflow SDK will not be auto-added

Related Issues

Fixes #12027

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow
Copy link

Hi @Prateekbala. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

🚫 This command cannot be processed. Only organization members or owners can use the commands.

Copy link
Collaborator

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@mprahl
Copy link
Collaborator

mprahl commented Oct 15, 2025

@Prateekbala could you please add a sign off to your commit?

@Prateekbala Prateekbala force-pushed the feature/kubeflow-sdk-integrate branch from 8f7cf52 to 8c10e7b Compare October 15, 2025 20:40
This commit adds automatic detection and installation of kubeflow packages
when they are imported in component functions.

Key changes:
- Added _detect_kubeflow_imports_in_function() to analyze AST for kubeflow imports
- Added _parse_package_name() helper to extract clean package names
- Enhanced _get_packages_to_install_command() to include kubeflow when detected
- Added install_kubeflow_package parameter to @component decorator
- Added kubeflow extras to setup.py for optional dependency management
- Fixed type annotation syntax in component_factory_test.py
- Fixed test function implementations to include actual kubeflow imports
- Corrected InputPath/OutputPath type annotations with proper type parameters

All tests now pass successfully.

Signed-off-by: Prateek Bala <[email protected]>
@Prateekbala Prateekbala force-pushed the feature/kubeflow-sdk-integrate branch from 8c10e7b to f5c34d1 Compare October 15, 2025 21:02
@Prateekbala
Copy link
Author

@Prateekbala could you please add a sign off to your commit?
I’ve added the sign-off to my commit. Thank you for pointing that out.

pip_index_urls: Optional[List[str]] = None,
output_component_file: Optional[str] = None,
install_kfp_package: bool = True,
install_kubeflow_package: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this an enum value instead? I'm thinking of the three options or similar:

  1. auto
  2. install
  3. skip

Then we can default to auto but allow the user to override (force an installation) in cases where auto doesn't work. I suppose the user could always force it with adding a value to packages_to_install but this makes the behavior clearer.

if detected_kubeflow:
# Parse existing packages to check for kubeflow
existing_package_names = [
_parse_package_name(pkg) for pkg in packages_to_install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to use a regex to specifically detect the presence of kubeflow rather than having generic package name detection.

def test_detect_kubeflow_imports_simple_import(self):
"""Test detection of 'import kubeflow'."""

def component_with_kubeflow_import():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test components don't seem to actually import kubeflow.

- Replace Union[bool, KubeflowPackageInstallMode] with enum-only approach
- Implement specific regex patterns for kubeflow package detection
- Remove _parse_package_name() and add _is_kubeflow_package() function
- Update test implementations with proper mocking and import statements
- Export KubeflowPackageInstallMode enum in __init__.py

Signed-off-by: Prateek Bala <[email protected]>
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Nov 2, 2025
@Prateekbala
Copy link
Author

I've implemented the requested changes

@mprahl
Copy link
Collaborator

mprahl commented Nov 5, 2025

Please rebase so that there aren't any merge commits.

@Prateekbala
Copy link
Author

Please rebase so that there aren't any merge commits.

Done

@chensun
Copy link
Member

chensun commented Nov 13, 2025

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl
I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.

In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.

@Prateekbala
Copy link
Author

Prateekbala commented Nov 15, 2025

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

Hi @chensun, I understand the concern around inferring dependencies. In the earlier discussion on #12027 and in the Kubeflow SDK design thread, @mprahl proposed using an AST-based check to detect kubeflow imports by providing a kubeflow extras option. The direction in that conversation was to implement this AST-based detection together with the extras entry as an initial solution, with the expectation that it could be revised when the broader Kubeflow SDK integration work evolves.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great contribution @Prateekbala!
/assign @kubeflow/kubeflow-sdk-team @franciscojavierarceo @Fiona-Waters @abhijeet-dhumal it would be great if you could help with review to include Kubeflow SDK into KFP!

Copy link

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Prateekbala!
I left a few initial comments

Comment on lines +25 to +37
class KubeflowPackageInstallMode(str, enum.Enum):
"""Installation mode for the Kubeflow SDK package in components.
Attributes:
AUTO: Automatically detect and install if kubeflow is imported in the component function.
INSTALL: Always install the kubeflow package regardless of usage.
SKIP: Never install the kubeflow package, even if detected in the component function.
"""
AUTO = 'auto'
INSTALL = 'install'
SKIP = 'skip'


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +47 to +48
install_kubeflow_package:
KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
install_kubeflow_package:
KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO,
install_kubeflow_package: KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO,

packages_to_install = packages_to_install or []

# Handle kubeflow installation based on the mode
if install_kubeflow_package == KubeflowPackageInstallMode.INSTALL:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if user uses KubeflowPackageInstallMode from other definition, we should keep enum in one place only so it's always the same type

Detects these import patterns:
- import kubeflow
- import kubeflow.training (any submodule)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no training submodule in sdk https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer

Suggested change
- import kubeflow.training (any submodule)
- import kubeflow.trainer (any submodule)

Comment on lines +107 to +113
install_kubeflow_package: Controls whether the Kubeflow SDK is installed.
Can be KubeflowPackageInstallMode.AUTO (default), KubeflowPackageInstallMode.INSTALL,
or KubeflowPackageInstallMode.SKIP.
- AUTO: Detects kubeflow imports in the component function via AST parsing
and automatically adds 'kubeflow' to packages_to_install if detected.
- INSTALL: Always installs the kubeflow package regardless of whether it's used.
- SKIP: Never installs kubeflow, even if detected (useful if pre-installed in base_image).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install_kubeflow_package reads like a boolean rather than a mode.
Would kubeflow_sdk_install_mode be clearer? cc @kubeflow/kubeflow-sdk-team @kubeflow/wg-pipeline-leads

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just follow the same signature as install_kfp_package ?
If the default value is AUTO, why would user change it to INSTALL ?

@kramaranya
Copy link

I'm wondering if we should revisit this given @chensun's concerns? Any other concerns from @kubeflow/wg-pipeline-leads about this approach or are we happy to move forward?

Copy link
Contributor

@Fiona-Waters Fiona-Waters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there could be some complications with this implementation around version clashes etc. If we do go ahead we would need clear user documentation around the install mode.

@mprahl
Copy link
Collaborator

mprahl commented Nov 18, 2025

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.

In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.

@chensun Thanks for the review. You raise valid points regarding the risks of "magic" behavior and dependency management. I definitely agree that we want to avoid inferring dependencies for general libraries.

However, I believe the Kubeflow SDK warrants a special exception separate from standard third-party packages. Our goal is to encourage seamless usage of the wider Kubeflow ecosystem within components. Conceptually, this somewhat aligns with how we already pre-install the kfp SDK into the container at runtime; we are simply extending that convenience to the broader Kubeflow namespace.

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

@Prateekbala
Copy link
Author

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:

  1. Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?

  2. Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?

  3. Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?

@mprahl
Copy link
Collaborator

mprahl commented Nov 19, 2025

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:

  1. Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?

I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?

  1. Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?

I was thinking the executor so the user can have debug information about which Kubeflow SDK was used at runtime.

  1. Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?

I think if the install_kfp_package argument is True default to auto. Otherwise let the user explicitly specify the behavior.

@kramaranya
Copy link

I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?

I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time

@andreyvelich
Copy link
Member

I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time

@kramaranya If we are planning to introduce breaking changes between minor releases, it might be better to stitch with <0.3 for now to not break users' pipelines after upgrade.

@mprahl
Copy link
Collaborator

mprahl commented Nov 19, 2025

Thank you very much for your efforts @Prateekbala!

We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:

  • The Concern: The KFP SDK version used to compile a pipeline does not necessarily match the active KFP deployment.
  • The Risk: If we pin the Kubeflow SDK at the KFP SDK level, we risk installing a Kubeflow SDK that is incompatible with the actual Kubeflow deployment.

Alternative Proposal

Instead, we believe this should be handled as an API Server concern, not a KFP SDK concern. The system should ideally resolve the associated Kubeflow SDK version (using major.minor/x.y alignment) based on the deployed KFP version. This ensures we always use the SDKs compatible with what is actually installed.

Since this architectural change has significant repercussions, we need a member of the KFP community to propose a KEP (Kubeflow Enhancement Proposal) to define this path forward.

@Prateekbala
Copy link
Author

Thank you very much for your efforts @Prateekbala!

We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:

Thanks for the update @mprahl! I’ll pause the PR for now. Happy to help and contribute more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Include Kubeflow SDK in KFP SDK

6 participants