-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(sdk): auto-detect and install Kubeflow SDK for component functions #12027 #12368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(sdk): auto-detect and install Kubeflow SDK for component functions #12027 #12368
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @Prateekbala. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
mprahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
|
@Prateekbala could you please add a sign off to your commit? |
8f7cf52 to
8c10e7b
Compare
This commit adds automatic detection and installation of kubeflow packages when they are imported in component functions. Key changes: - Added _detect_kubeflow_imports_in_function() to analyze AST for kubeflow imports - Added _parse_package_name() helper to extract clean package names - Enhanced _get_packages_to_install_command() to include kubeflow when detected - Added install_kubeflow_package parameter to @component decorator - Added kubeflow extras to setup.py for optional dependency management - Fixed type annotation syntax in component_factory_test.py - Fixed test function implementations to include actual kubeflow imports - Corrected InputPath/OutputPath type annotations with proper type parameters All tests now pass successfully. Signed-off-by: Prateek Bala <[email protected]>
…o feature/kubeflow-sdk-integrate
8c10e7b to
f5c34d1
Compare
|
| pip_index_urls: Optional[List[str]] = None, | ||
| output_component_file: Optional[str] = None, | ||
| install_kfp_package: bool = True, | ||
| install_kubeflow_package: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this an enum value instead? I'm thinking of the three options or similar:
- auto
- install
- skip
Then we can default to auto but allow the user to override (force an installation) in cases where auto doesn't work. I suppose the user could always force it with adding a value to packages_to_install but this makes the behavior clearer.
| if detected_kubeflow: | ||
| # Parse existing packages to check for kubeflow | ||
| existing_package_names = [ | ||
| _parse_package_name(pkg) for pkg in packages_to_install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to use a regex to specifically detect the presence of kubeflow rather than having generic package name detection.
| def test_detect_kubeflow_imports_simple_import(self): | ||
| """Test detection of 'import kubeflow'.""" | ||
|
|
||
| def component_with_kubeflow_import(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test components don't seem to actually import kubeflow.
- Replace Union[bool, KubeflowPackageInstallMode] with enum-only approach - Implement specific regex patterns for kubeflow package detection - Remove _parse_package_name() and add _is_kubeflow_package() function - Update test implementations with proper mocking and import statements - Export KubeflowPackageInstallMode enum in __init__.py Signed-off-by: Prateek Bala <[email protected]>
|
I've implemented the requested changes |
|
Please rebase so that there aren't any merge commits. |
Done |
@Prateekbala @mprahl We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach. In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies. |
Hi @chensun, I understand the concern around inferring dependencies. In the earlier discussion on #12027 and in the Kubeflow SDK design thread, @mprahl proposed using an AST-based check to detect kubeflow imports by providing a kubeflow extras option. The direction in that conversation was to implement this AST-based detection together with the extras entry as an initial solution, with the expectation that it could be revised when the broader Kubeflow SDK integration work evolves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great contribution @Prateekbala!
/assign @kubeflow/kubeflow-sdk-team @franciscojavierarceo @Fiona-Waters @abhijeet-dhumal it would be great if you could help with review to include Kubeflow SDK into KFP!
kramaranya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Prateekbala!
I left a few initial comments
| class KubeflowPackageInstallMode(str, enum.Enum): | ||
| """Installation mode for the Kubeflow SDK package in components. | ||
| Attributes: | ||
| AUTO: Automatically detect and install if kubeflow is imported in the component function. | ||
| INSTALL: Always install the kubeflow package regardless of usage. | ||
| SKIP: Never install the kubeflow package, even if detected in the component function. | ||
| """ | ||
| AUTO = 'auto' | ||
| INSTALL = 'install' | ||
| SKIP = 'skip' | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we duplicate this definition from https://github.com/Prateekbala/pipelines/blob/feature/kubeflow-sdk-integrate/sdk/python/kfp/dsl/component_factory.py#L52-L62? We can just import it
| install_kubeflow_package: | ||
| KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| install_kubeflow_package: | |
| KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO, | |
| install_kubeflow_package: KubeflowPackageInstallMode = KubeflowPackageInstallMode.AUTO, |
| packages_to_install = packages_to_install or [] | ||
|
|
||
| # Handle kubeflow installation based on the mode | ||
| if install_kubeflow_package == KubeflowPackageInstallMode.INSTALL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if user uses KubeflowPackageInstallMode from other definition, we should keep enum in one place only so it's always the same type
| Detects these import patterns: | ||
| - import kubeflow | ||
| - import kubeflow.training (any submodule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no training submodule in sdk https://github.com/kubeflow/sdk/tree/main/kubeflow/trainer
| - import kubeflow.training (any submodule) | |
| - import kubeflow.trainer (any submodule) |
| install_kubeflow_package: Controls whether the Kubeflow SDK is installed. | ||
| Can be KubeflowPackageInstallMode.AUTO (default), KubeflowPackageInstallMode.INSTALL, | ||
| or KubeflowPackageInstallMode.SKIP. | ||
| - AUTO: Detects kubeflow imports in the component function via AST parsing | ||
| and automatically adds 'kubeflow' to packages_to_install if detected. | ||
| - INSTALL: Always installs the kubeflow package regardless of whether it's used. | ||
| - SKIP: Never installs kubeflow, even if detected (useful if pre-installed in base_image). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
install_kubeflow_package reads like a boolean rather than a mode.
Would kubeflow_sdk_install_mode be clearer? cc @kubeflow/kubeflow-sdk-team @kubeflow/wg-pipeline-leads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just follow the same signature as install_kfp_package ?
If the default value is AUTO, why would user change it to INSTALL ?
|
I'm wondering if we should revisit this given @chensun's concerns? Any other concerns from @kubeflow/wg-pipeline-leads about this approach or are we happy to move forward? |
Fiona-Waters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there could be some complications with this implementation around version clashes etc. If we do go ahead we would need clear user documentation around the install mode.
@chensun Thanks for the review. You raise valid points regarding the risks of "magic" behavior and dependency management. I definitely agree that we want to avoid inferring dependencies for general libraries. However, I believe the Kubeflow SDK warrants a special exception separate from standard third-party packages. Our goal is to encourage seamless usage of the wider Kubeflow ecosystem within components. Conceptually, this somewhat aligns with how we already pre-install the To address your concerns about version drift and user control, I propose the following safeguards for this PR:
|
Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:
|
I think we can keep the range
I was thinking the executor so the user can have debug information about which Kubeflow SDK was used at runtime.
I think if the |
I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time |
@kramaranya If we are planning to introduce breaking changes between minor releases, it might be better to stitch with |
|
Thank you very much for your efforts @Prateekbala! We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:
Alternative Proposal Instead, we believe this should be handled as an API Server concern, not a KFP SDK concern. The system should ideally resolve the associated Kubeflow SDK version (using major.minor/x.y alignment) based on the deployed KFP version. This ensures we always use the SDKs compatible with what is actually installed. Since this architectural change has significant repercussions, we need a member of the KFP community to propose a KEP (Kubeflow Enhancement Proposal) to define this path forward. |
Thanks for the update @mprahl! I’ll pause the PR for now. Happy to help and contribute more. |
Description
This PR implements automatic Kubeflow SDK integration for KFP components as requested in #12027. It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.
Changes Made
1. Added Kubeflow Extras to setup.py
kubeflow = ['kubeflow']extras option'all'extras bundlepip install kfp[kubeflow]orpip install kfp[all]2. Implemented AST-based Auto-detection
_detect_kubeflow_imports_in_function()incomponent_factory.pyast+inspect+textwrap.dedent) to detect kubeflow imports in component functionsimport kubeflowimport kubeflow.<submodule>from kubeflow import <symbol>3. Automatic Package Installation
_get_packages_to_install_command(...)to auto-detect kubeflow usage'kubeflow'topackages_to_installwhen detected and not already specified4. Opt-out Control per Component
@dsl.componentwithinstall_kubeflow_package: bool = TrueTrue(default): auto-add kubeflow if user code imports itFalse: never auto-add kubeflow for that component5. Comprehensive Test Coverage
Usage
Install via extras:
Default auto-detection (no user change needed):
Opt-out:
Related Issues
Fixes #12027