Skip to content

Commit 73c2cf0

Browse files
sfc-gh-kdamaSnowflake Authors
andauthored
Release snowflake-ml-python: 1.0.11 (#61)
Co-authored-by: Snowflake Authors <[email protected]>
1 parent 9130a0b commit 73c2cf0

File tree

114 files changed

+3918
-2006
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+3918
-2006
lines changed

.bazelrc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ coverage --instrumentation_filter="-//tests[/:]"
1414
build:_build --platforms //bazel/platforms:snowflake_conda_env --host_platform //bazel/platforms:snowflake_conda_env --repo_env=BAZEL_CONDA_ENV_NAME=build
1515
build:_sf_only --platforms //bazel/platforms:snowflake_conda_env --host_platform //bazel/platforms:snowflake_conda_env --repo_env=BAZEL_CONDA_ENV_NAME=sf_only
1616
build:_extended --platforms //bazel/platforms:extended_conda_env --host_platform //bazel/platforms:extended_conda_env --repo_env=BAZEL_CONDA_ENV_NAME=extended
17+
build:_extended_oss --platforms //bazel/platforms:extended_conda_env --host_platform //bazel/platforms:extended_conda_env --repo_env=BAZEL_CONDA_ENV_NAME=extended_oss
1718

1819
# Public definitions
1920

@@ -35,6 +36,7 @@ run:pre_build --config=_build --config=py3.8
3536

3637
# Config to run type check
3738
build:typecheck --aspects @rules_mypy//:mypy.bzl%mypy_aspect --output_groups=mypy --config=_extended --config=py3.8
39+
build:typecheck_oss --aspects @rules_mypy//:mypy.bzl%mypy_aspect --output_groups=mypy --config=_extended_oss --config=py3.8
3840

3941
# Config to build the doc
4042
build:docs --config=_sf_only --config=py3.8
@@ -44,3 +46,6 @@ build:docs --config=_sf_only --config=py3.8
4446
test:extended --config=_extended
4547
run:extended --config=_extended
4648
cquery:extended --config=_extended
49+
test:extended_oss --config=_extended_oss
50+
run:extended_oss --config=_extended_oss
51+
cquery:extended_oss --config=_extended_oss

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Release History
22

3+
## 1.0.11
4+
5+
### New Features
6+
7+
- Model Registry: Add log_artifact() public method.
8+
- Model Development: Add support for `kneighbors`.
9+
10+
### Behavior Changes
11+
12+
- Model Registry: Change log_model() argument from TrainingDataset to List of Artifact.
13+
- Model Registry: Change get_training_dataset() to get_artifact().
14+
15+
### Bug Fixes
16+
17+
- Model Development: Fix support for XGBoost and LightGBM models using SKLearn Grid Search and Randomized Search model selectors.
18+
- Model Development: DecimalType is now supported as a DataType.
19+
- Model Development: Fix metrics compatibility with Snowpark Dataframes that use Snowflake identifiers
20+
321
## 1.0.10
422

523
### Behavior Changes

CONTRIBUTING.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,57 @@ When you add a new test file, you should always ensure the existence of a `if __
347347
the test file will not be instructed by bazel. We have a test wrapper [here](./bazel/test_wrapper.sh) to ensure that the
348348
test will fail if you forget that part.
349349

350+
## Integration test
351+
352+
### Test in Store Procedure
353+
354+
To test if your code is working in store procedure or not simply, you could work based on `CommonTestBase` in
355+
`tests/integ/snowflake/ml/test_utils/common_test_base.py`. An example of such test could be found in
356+
`tests/integ/snowflake/ml/_internal/file_utils_integ_test.py`.
357+
358+
To write a such test, you need to
359+
360+
1. Let your test case inherit from `common_test_base.CommonTestBase`.
361+
1. Remove all Snowpark Session creation in your test, and use `self.session` to access the session if needed.
362+
1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or `super().tearDown().`
363+
1. Decorate your test method with `common_test_base.CommonTestBase.sproc_test()`. If you want your test running in
364+
store procedure only rather than both locally and in store procedure, set `local=False`. If you don't want to test
365+
with caller's rights, set `test_callers_rights=False`. (Owner's rights store procedure is always tested)
366+
367+
**Attention**: Depending on your configurations, 1-3 sub-tests will be run in your test method.
368+
Sub-test means that `setUp` and `tearDown` won't run every sub-test and will only run once before and
369+
after the whole test method. So it is important to make your test case self-contained.
370+
371+
### Compatibility Test
372+
373+
To test if your code is compatible with previous version simply, you could work based on `CommonTestBase` in
374+
`tests/integ/snowflake/ml/test_utils/common_test_base.py`. An example of such test could be found in
375+
`tests/integ/snowflake/ml/registry/model_registry_compat_test.py`.
376+
377+
To write a such test, you need to
378+
379+
1. Let your test case inherit from `common_test_base.CommonTestBase`.
380+
1. Remove all Snowpark Session creation in your test, and use `self.session` to access the session if needed.
381+
1. If you write your own `setUp` and `tearDown` method, remember to call `super().setUp()` or `super().tearDown().`
382+
1. Write a factory method in your test class that return a tuple of a function and its parameters as a tuple. The
383+
function will be run as a store procedure in the environment with previous version of library.
384+
385+
**Note**: Since the function will be created as a store procedure, the first argument must be a Snowpark Session.
386+
The arguments tuple you provided via the factory method does not require to include the session object.
387+
388+
**Note**: To avoid any objects from current environment affecting the result, instead of using `cloudpickle` to
389+
pickle the function, the function will be created as a Python file and registered as a store procedure. This means
390+
you cannot use any object outside of the function, and if you want to import anything, you need to import inside
391+
the function definition. So it would help if you make your prepare function as simple as possible.
392+
393+
1. Decorate your test method with `common_test_base.CommonTestBase.compatibility_test`, providing the factory method
394+
you created in the above step, optional version range to test with, as well as additional package requirements.
395+
396+
**Attention**: For every version available in the server and within the version range, a sub-test will be run that
397+
contains a run of prepare function in the store procedure and a run of the method. Sub-test means that `setUp` and
398+
`tearDown` won't run every sub-test and will only run once before and after the whole test method. So it is
399+
important to make your test case self-contained.
400+
350401
## `pre-commit`
351402

352403
Pull requests against the main branch are subject to `pre-commit` checks. Those checks enforce the code style.

README.md

Lines changed: 45 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,32 @@
22

33
Snowpark ML is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models.
44
With Snowpark ML, you can pre-process data, train, manage and deploy ML models all within Snowflake, using a single SDK,
5-
and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine
6-
Learning workflow.
5+
and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine
6+
Learning workflow.
77

88
## Key Components of Snowpark ML
99

1010
The Snowpark ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development
11-
and deployment process, and includes two key components.
11+
and deployment process, and includes two key components.
1212

1313
### Snowpark ML Development [Public Preview]
1414

15-
A collection of python APIs to enable efficient model development directly in Snowflake:
15+
[Snowpark ML Development](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowpark-ml-development)
16+
provides a collection of python APIs enabling efficient ML model development directly in Snowflake:
1617

17-
1. Modeling API (snowflake.ml.modeling) for data preprocessing, feature engineering and model training in Snowflake.
18-
This includes snowflake.ml.modeling.preprocessing for scalable data transformations on large data sets utilizing the
19-
compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML model
20-
development classes based on sklearn, xgboost, and lightgbm. See the private preview limited access docs (Preprocessing,
21-
Modeling for more details on these.
18+
1. Modeling API (`snowflake.ml.modeling`) for data preprocessing, feature engineering and model training in Snowflake.
19+
This includes the `snowflake.ml.modeling.preprocessing` module for scalable data transformations on large data sets
20+
utilizing the compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML
21+
model development classes based on sklearn, xgboost, and lightgbm.
2222

2323
1. Framework Connectors: Optimized, secure and performant data provisioning for Pytorch and Tensorflow frameworks in
2424
their native data loader formats.
2525

2626
### Snowpark ML Ops [Private Preview]
2727

28-
Snowpark MLOps complements the Snowpark ML Development API, and provides model management capabilities along with
29-
integrated deployment into Snowflake. Currently, the API consists of
28+
[Snowpark MLOps](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowpark-ml-ops) complements the
29+
Snowpark ML Development API, and provides model management capabilities along with integrated deployment into Snowflake.
30+
Currently, the API consists of:
3031

3132
1. FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage
3233
from a query or Snowpark Dataframe along with a number of convenience APIs.
@@ -37,26 +38,48 @@ Snowflake Warehouses as vectorized UDFs.
3738
During PrPr, we are iterating on API without backward compatibility guarantees. It is better to recreate your registry
3839
everytime you update the package. This means, at this time, you cannot use the registry for production use.
3940

40-
- [Documentation](https://docs.snowflake.com/developer-guide/snowpark-ml)
41-
4241
## Getting started
4342

4443
### Have your Snowflake account ready
4544

4645
If you don't have a Snowflake account yet, you can [sign up for a 30-day free trial account](https://signup.snowflake.com/).
4746

48-
### Create a Python virtual environment
47+
### Installation
48+
49+
Follow the [installation instructions](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#installing-snowpark-ml)
50+
in the Snowflake documentation.
4951

50-
Python version 3.8, 3.9 & 3.10 are supported. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html),
51-
[anaconda](https://www.anaconda.com/), or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a virtual
52-
environment.
52+
Python versions 3.8, 3.9 & 3.10 are supported. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html) or
53+
[anaconda](https://www.anaconda.com/) to create a Conda environment (recommended),
54+
or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a virtual environment.
5355

54-
To have the best experience when using this library, [creating a local conda environment with the Snowflake channel](
55-
https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#local-development-and-testing)
56-
is recommended.
56+
### Conda channels
5757

58-
### Install the library to the Python virtual environment
58+
The [Snowflake Conda Channel](https://repo.anaconda.com/pkgs/snowflake/) contains the official snowpark ML package releases.
59+
The recommended approach is to install `snowflake-ml-python` this conda channel:
5960

6061
```sh
61-
pip install snowflake-ml-python
62+
conda install \
63+
-c https://repo.anaconda.com/pkgs/snowflake \
64+
--override-channels \
65+
snowflake-ml-python
66+
```
67+
68+
See [the developer guide](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index) for installation instructions.
69+
70+
The latest version of the `snowpark-ml-python` package is also published in a conda channel in this repository. Package versions
71+
in this channel may not yet be present in the official Snowflake conda channel.
72+
73+
Install `snowflake-ml-python` from this channel with the following (being sure to replace `<version_specifier>` with the
74+
desired version, e.g. `1.0.10`):
75+
76+
```bash
77+
conda install \
78+
-c https://raw.githubusercontent.com/snowflakedb/snowflake-ml-python/conda/releases/ \
79+
-c https://repo.anaconda.com/pkgs/snowflake \
80+
--override-channels \
81+
snowflake-ml-python==<version_specifier>
6282
```
83+
84+
Note that until a `snowflake-ml-python` package version is available in the official Snowflake conda channel, there may
85+
be compatibility issues. Server-side functionality that `snowflake-ml-python` depends on may not yet be released.

bazel/environments/conda-env.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,7 @@ dependencies:
5858
- types-requests==2.30.0.0
5959
- typing-extensions==4.5.0
6060
- xgboost==1.7.3
61+
- pip
62+
- pip:
63+
- --extra-index-url https://pypi.org/simple
64+
- peft==0.5.0

bazel/environments/fetch_conda_env_config.bzl

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ def _fetch_conda_env_config_impl(rctx):
1616
"compatible_target": ["@SnowML//bazel/platforms:extended_conda_channels"],
1717
"environment": "@//bazel/environments:conda-env.yml",
1818
},
19+
# `extended_oss` is the extended env for OSS repo which is a strict subset of `extended`.
20+
# It's intended for development without dev VPN.
21+
"extended_oss": {
22+
"compatible_target": ["@SnowML//bazel/platforms:extended_conda_channels"],
23+
"environment": "@//bazel/environments:conda-env.yml",
24+
},
1925
"sf_only": {
2026
"compatible_target": ["@SnowML//bazel/platforms:snowflake_conda_channel"],
2127
"environment": "@//bazel/environments:conda-env-snowflake.yml",

bazel/requirements/parse_and_generate_requirements.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import argparse
22
import collections
33
import contextlib
4+
import copy
45
import functools
56
import itertools
67
import json
@@ -146,6 +147,9 @@ def generate_dev_pinned_string(
146147
version = req_info.get("dev_version_conda", req_info.get("dev_version", None))
147148
if version is None:
148149
raise ValueError("No pinned version exists.")
150+
if env == "conda-only":
151+
if "dev_version_conda" in req_info or "dev_version" in req_info:
152+
return None
149153
from_channel = req_info.get("from_channel", None)
150154
if version == "":
151155
version_str = ""
@@ -158,6 +162,9 @@ def generate_dev_pinned_string(
158162
version = req_info.get("dev_version_pypi", req_info.get("dev_version", None))
159163
if version is None:
160164
raise ValueError("No pinned version exists.")
165+
if env == "pip-only":
166+
if "dev_version_conda" in req_info or "dev_version" in req_info:
167+
return None
161168
if version == "":
162169
version_str = ""
163170
else:
@@ -341,9 +348,15 @@ def generate_requirements(
341348
sorted(filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "conda"), requirements)))
342349
)
343350

344-
extended_env: List[Union[str, MutableMapping[str, Sequence[str]]]] = extended_env_conda # type: ignore[assignment]
351+
extended_env: List[Union[str, MutableMapping[str, Sequence[str]]]] = copy.deepcopy(
352+
extended_env_conda # type: ignore[arg-type]
353+
)
354+
# Relative order needs to be maintained here without sorting.
355+
# For external pip-only packages, we want to it able to access pypi.org index,
356+
# while for internal pip-only packages, nexus is the only viable index.
357+
# Relative order is here to prevent nexus index overriding public index.
345358
pip_only_reqs = list(
346-
sorted(filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "pip-only"), requirements)))
359+
filter(None, map(lambda req_info: generate_dev_pinned_string(req_info, "pip-only"), requirements))
347360
)
348361
if pip_only_reqs:
349362
extended_env.extend(["pip", {"pip": pip_only_reqs}])

ci/RunBazelAction.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@ elif [[ "${action}" = "coverage" ]]; then
158158
"${cache_test_results}" \
159159
--combined_report=lcov \
160160
"${coverage_tag_filter}" \
161+
--experimental_collect_code_coverage_for_generated_files \
161162
--target_pattern_file "${sf_only_test_targets_file}"
162163
sf_only_bazel_exit_code=$?
163164

@@ -170,6 +171,7 @@ elif [[ "${action}" = "coverage" ]]; then
170171
"${cache_test_results}" \
171172
--combined_report=lcov \
172173
"${coverage_tag_filter}" \
174+
--experimental_collect_code_coverage_for_generated_files \
173175
--target_pattern_file "${extended_test_targets_file}"
174176
extended_bazel_exit_code=$?
175177

ci/conda_recipe/meta.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ build:
1717
noarch: python
1818
package:
1919
name: snowflake-ml-python
20-
version: 1.0.10
20+
version: 1.0.11
2121
requirements:
2222
build:
2323
- python
@@ -49,7 +49,7 @@ requirements:
4949
- mlflow>=2.1.0,<2.4
5050
- sentencepiece>=0.1.95,<0.2
5151
- shap==0.42.1
52-
- tensorflow>=2.9,<3
52+
- tensorflow>=2.9,<3,!=2.12.0
5353
- tokenizers>=0.10,<1
5454
- torchdata>=0.4,<1
5555
- transformers>=4.29.2,<5

0 commit comments

Comments
 (0)