Skip to content

Commit 192f794

Browse files
snowflake-provisionerSnowflake Authors
andauthored
Project import generated by Copybara. (#38)
GitOrigin-RevId: 58e65003b64918af74ece769567892c98a3f9fbd Co-authored-by: Snowflake Authors <[email protected]>
1 parent f3a83fb commit 192f794

File tree

159 files changed

+12013
-3282
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

159 files changed

+12013
-3282
lines changed

CHANGELOG.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,41 @@
11
# Release History
22

3-
## 1.0.5
3+
## 1.0.6
4+
5+
### New Features
6+
- Model Registry: add `create_if_not_exists` parameter in constructor.
7+
- Model Registry: Added get_or_create_model_registry API.
8+
- Model Registry: Added support for using GPU inference when deploying XGBoost (`xgboost.XGBModel` and `xgboost.Booster`), PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models to Snowpark Container Services.
9+
- Model Registry: When inferring model signature, `Sequence` of built-in types, `Sequence` of `numpy.ndarray`, `Sequence` of `torch.Tensor`, `Sequence` of `tensorflow.Tensor` and `Sequence` of `tensorflow.Tensor` can be used instead of only `List` of them.
10+
- Model Registry: Added `get_training_dataset` API.
11+
- Model Development: Size of metrics result can exceed previous 8MB limit.
12+
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (`transformers.Pipeline`) and our wrapper (`snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel`) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:
13+
- "conversational"
14+
- "fill-mask"
15+
- "question-answering"
16+
- "summarization"
17+
- "table-question-answering"
18+
- "text2text-generation"
19+
- "text-classification" (alias "sentiment-analysis" available)
20+
- "text-generation"
21+
- "token-classification" (alias "ner" available)
22+
- "translation"
23+
- "translation_xx_to_yy"
24+
- "zero-shot-classification"
25+
26+
### Bug Fixes
27+
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
28+
- Model Development: Fixed a bug when inferring the type of label columns.
29+
30+
### Behavior Changes
31+
- Model Registry: `log_model()` now return a `ModelReference` object instead of a model ID.
32+
- Model Registry: When deploying a model with 1 `target method` only, the `target_method` argument can be omitted.
33+
- Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda Channel, `embed_local_ml_library` option will be set as `True` automatically if not.
34+
- Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
35+
- Model Registry: `keep_order` and `output_with_input_features` in the deploy options have been removed. Now the behavior is controlled by the type of the input when calling `model.predict()`. If the input is a `pandas.DataFrame`, the behavior will be the same as `keep_order=True` and `output_with_input_features=False` before. If the input is a `snowpark.DataFrame`, the behavior will be the same as `keep_order=False` and `output_with_input_features=True` before.
36+
- Model Registry: When logging and deploying PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.
37+
38+
## 1.0.5 (2023-08-17)
439

540
### New Features
641

@@ -13,7 +48,7 @@
1348
- Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
1449
- connection_params.SnowflakeLoginOptions(): Added support for `private_key_path`.
1550

16-
## 1.0.4
51+
## 1.0.4 (2023-07-28)
1752

1853
### New Features
1954

bazel/environments/conda-env-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ dependencies:
1414
- numpy==1.24.3
1515
- packaging==23.0
1616
- pyyaml==6.0
17-
- scikit-learn==1.2.2
17+
- scikit-learn==1.3.0
1818
- xgboost==1.7.3

bazel/environments/conda-env-snowflake.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ dependencies:
99
- aiohttp==3.8.3
1010
- anyio==3.5.0
1111
- boto3==1.24.28
12+
- cachetools==4.2.2
1213
- cloudpickle==2.0.0
1314
- conda-libmamba-solver==23.3.0
1415
- coverage==6.3.2
@@ -23,6 +24,7 @@ dependencies:
2324
- lightgbm==3.3.5
2425
- mlflow==2.3.1
2526
- moto==4.0.11
27+
- multipledispatch==0.6.0
2628
- mypy==0.981
2729
- networkx==2.8.4
2830
- numpy==1.24.3
@@ -36,13 +38,14 @@ dependencies:
3638
- requests==2.29.0
3739
- ruamel.yaml==0.17.21
3840
- s3fs==2022.11.0
39-
- scikit-learn==1.2.2
41+
- scikit-learn==1.3.0
4042
- scipy==1.9.3
4143
- snowflake-connector-python==3.0.3
4244
- snowflake-snowpark-python==1.5.1
4345
- sqlparse==0.4.3
4446
- tensorflow==2.10.0
4547
- transformers==4.29.2
4648
- types-protobuf==4.23.0.1
49+
- types-requests==2.30.0.0
4750
- typing-extensions==4.5.0
4851
- xgboost==1.7.3

bazel/environments/conda-env.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@ dependencies:
99
- aiohttp==3.8.3
1010
- anyio==3.5.0
1111
- boto3==1.24.28
12+
- cachetools==4.2.2
1213
- cloudpickle==2.0.0
1314
- conda-forge::starlette==0.27.0
1415
- conda-forge::types-PyYAML==6.0.12
16+
- conda-forge::types-cachetools==4.2.2
1517
- conda-libmamba-solver==23.3.0
1618
- coverage==6.3.2
1719
- cryptography==39.0.1
@@ -25,6 +27,7 @@ dependencies:
2527
- lightgbm==3.3.5
2628
- mlflow==2.3.1
2729
- moto==4.0.11
30+
- multipledispatch==0.6.0
2831
- mypy==0.981
2932
- networkx==2.8.4
3033
- numpy==1.24.3
@@ -39,13 +42,14 @@ dependencies:
3942
- requests==2.29.0
4043
- ruamel.yaml==0.17.21
4144
- s3fs==2022.11.0
42-
- scikit-learn==1.2.2
45+
- scikit-learn==1.3.0
4346
- scipy==1.9.3
4447
- snowflake-connector-python==3.0.3
4548
- snowflake-snowpark-python==1.5.1
4649
- sqlparse==0.4.3
4750
- tensorflow==2.10.0
4851
- transformers==4.29.2
4952
- types-protobuf==4.23.0.1
53+
- types-requests==2.30.0.0
5054
- typing-extensions==4.5.0
5155
- xgboost==1.7.3

ci/conda_recipe/meta.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ build:
1717
noarch: python
1818
package:
1919
name: snowflake-ml-python
20-
version: 1.0.5
20+
version: 1.0.6
2121
requirements:
2222
build:
2323
- python
@@ -34,7 +34,7 @@ requirements:
3434
- python
3535
- pyyaml>=6.0,<7
3636
- requests
37-
- scikit-learn>=1.2.1,<1.3
37+
- scikit-learn>=1.2.1,<1.4
3838
- scipy>=1.9,<2
3939
- snowflake-connector-python>=3.0.3,<4
4040
- snowflake-snowpark-python>=1.5.1,<2
@@ -43,8 +43,9 @@ requirements:
4343
- xgboost>=1.7.3,<2
4444
run_constrained:
4545
- lightgbm==3.3.5
46-
- mlflow>=2.1.0,<3
46+
- mlflow>=2.1.0,<2.4
4747
- tensorflow>=2.9,<3
4848
- torchdata>=0.4,<1
49+
- transformers>=4.29.2,<5
4950
source:
5051
path: ../../

codegen/sklearn_wrapper_template.py_template

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ from snowflake.snowpark import DataFrame, Session
2525
from snowflake.snowpark.functions import pandas_udf, sproc
2626
from snowflake.snowpark.types import PandasSeries
2727
from snowflake.snowpark._internal.type_utils import convert_sp_to_sf_type
28+
from snowflake.snowpark._internal.utils import (
29+
TempObjectType,
30+
random_name_for_temp_object,
31+
)
2832

2933
from snowflake.ml.model.model_signature import (
3034
DataType,
@@ -244,7 +248,7 @@ class {transform.original_class_name}(BaseTransformer):
244248
cp.dump(self._sklearn_object, local_transform_file)
245249

246250
# Create temp stage to run fit.
247-
transform_stage_name = "SNOWML_TRANSFORM_{{safe_id}}".format(safe_id=self._get_rand_id())
251+
transform_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
248252
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{transform_stage_name}};"
249253
SqlResultValidator(
250254
session=session,
@@ -258,7 +262,7 @@ class {transform.original_class_name}(BaseTransformer):
258262
stage_result_file_name = posixpath.join(transform_stage_name, os.path.basename(local_transform_file_name))
259263
local_result_file_name = get_temp_file_path()
260264

261-
fit_sproc_name = "SNOWML_FIT_{{safe_id}}".format(safe_id=self._get_rand_id())
265+
fit_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
262266
statement_params = telemetry.get_function_usage_statement_params(
263267
project=_PROJECT,
264268
subproject=_SUBPROJECT,
@@ -439,8 +443,7 @@ class {transform.original_class_name}(BaseTransformer):
439443
pkg_versions=self._get_dependencies(), session=session, subproject=_SUBPROJECT)
440444

441445
# Register vectorized UDF for batch inference
442-
batch_inference_udf_name = "SNOWML_BATCH_INFERENCE_{{safe_id}}_{{method}}".format(
443-
safe_id=self._get_rand_id(), method=inference_method)
446+
batch_inference_udf_name = random_name_for_temp_object(TempObjectType.FUNCTION)
444447

445448
# Need to do this since if we use self._sklearn_object directly in the UDF, Snowpark
446449
# will try to pickle all of self which fails.
@@ -701,8 +704,17 @@ class {transform.original_class_name}(BaseTransformer):
701704
expected_type_inferred = "{transform.udf_datatype}"
702705
# when it is classifier, infer the datatype from label columns
703706
if expected_type_inferred == "" and 'predict' in self.model_signatures:
707+
# Batch inference takes a single expected output column type. Use the first columns type for now.
708+
# TODO: Handle varying output column types.
709+
label_cols_signatures = [row for row in self.model_signatures['predict'].outputs if row.name in self.output_cols]
710+
if len(label_cols_signatures) == 0:
711+
error_str = f"Output columns {{self.output_cols}} do not match model signatures {{self.model_signatures['predict'].outputs}}."
712+
raise exceptions.SnowflakeMLException(
713+
error_code=error_codes.INVALID_ATTRIBUTE,
714+
original_exception=ValueError(error_str),
715+
)
704716
expected_type_inferred = convert_sp_to_sf_type(
705-
self.model_signatures['predict'].outputs[0].as_snowpark_type()
717+
label_cols_signatures[0].as_snowpark_type()
706718
)
707719

708720
output_df = self._batch_inference(
@@ -955,7 +967,7 @@ class {transform.original_class_name}(BaseTransformer):
955967
cp.dump(self._sklearn_object, local_score_file)
956968

957969
# Create temp stage to run score.
958-
score_stage_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
970+
score_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
959971
session = dataset._session
960972
assert session is not None # keep mypy happy
961973
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{score_stage_name}};"
@@ -968,7 +980,7 @@ class {transform.original_class_name}(BaseTransformer):
968980

969981
# Use posixpath to construct stage paths
970982
stage_score_file_name = posixpath.join(score_stage_name, os.path.basename(local_score_file_name))
971-
score_sproc_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
983+
score_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
972984
statement_params = telemetry.get_function_usage_statement_params(
973985
project=_PROJECT,
974986
subproject=_SUBPROJECT,

requirements.yml

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
version_requirements: ">=0.15,<2"
6969
tags:
7070
- build_essential
71+
- deployment_core
7172
# For fsspec[http] in conda
7273
- name_conda: aiohttp
7374
dev_version_conda: "3.8.3"
@@ -123,7 +124,7 @@
123124
- build_essential
124125
- name: mlflow
125126
dev_version: "2.3.1"
126-
version_requirements: ">=2.1.0,<3"
127+
version_requirements: ">=2.1.0,<2.4"
127128
requirements_extra_tags:
128129
- mlflow
129130
- name: moto
@@ -176,8 +177,8 @@
176177
- name: s3fs
177178
dev_version: "2022.11.0"
178179
- name: scikit-learn
179-
dev_version: "1.2.2"
180-
version_requirements: ">=1.2.1,<1.3"
180+
dev_version: "1.3.0"
181+
version_requirements: ">=1.2.1,<1.4"
181182
tags:
182183
- build_essential
183184
- name: scipy
@@ -211,6 +212,11 @@
211212
- torch
212213
- name: transformers
213214
dev_version: "4.29.2"
215+
version_requirements: ">=4.29.2,<5"
216+
requirements_extra_tags:
217+
- transformers
218+
- name: types-requests
219+
dev_version: "2.30.0.0"
214220
- name: types-protobuf
215221
dev_version: "4.23.0.1"
216222
- name: types-PyYAML
@@ -226,3 +232,12 @@
226232
version_requirements: ">=1.7.3,<2"
227233
tags:
228234
- build_essential
235+
- name: types-cachetools
236+
dev_version: "4.2.2"
237+
from_channel: conda-forge
238+
- name: cachetools
239+
dev_version: "4.2.2"
240+
# TODO: this will be a user side dep requirement
241+
# enable when we are releasing FS.
242+
- name: multipledispatch
243+
dev_version: "0.6.0"

0 commit comments

Comments
 (0)