Releases · snowflakedb/snowflake-ml-python

29 Jul 04:10

snowflake-connectors-app

1.9.2

2bd6eac

1.9.2

Bug Fixes

DataConnector: Fix self._session related errors inside Container Runtime.
Registry: Fix a bug when trying to pass None to array (pd.dtype('O')) in signature and pandas data handler.

New Features

Experiment Tracking (PrPr): Automatically log the model, metrics, and parameters while training
XGBoost and LightGBM models.

from snowflake.ml.experiment import ExperimentTracking
from snowflake.ml.experiment.callback import SnowflakeXgboostCallback, SnowflakeLightgbmCallback

exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")

exp.set_experiment("MY_EXPERIMENT")

# XGBoost
callback = SnowflakeXgboostCallback(
  exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = XGBClassifier(callbacks=[callback])
with exp.start_run():
  model.fit(X, y, eval_set=[(X_test, y_test)])

# LightGBM
callback = SnowflakeLightgbmCallback(
  exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = LGBMClassifier()
with exp.start_run():
  model.fit(X, y, eval_set=[(X_test, y_test)], callbacks=[callback])

Assets 5

18 Jul 22:09

snowflake-connectors-app

1.9.1

e7f04a0

1.9.1

Bug Fixes

Registry: Fix a bug when trying to set the PAD token the HuggingFace text-generation model had multiple EOS tokens.
The handler picks the first EOS token as PAD token now.

New Features

DataConnector: DataConnector objects can now be pickled
Dataset: Dataset objects can now be pickled
Registry (PrPr): Introducing create_service function in snowflake/ml/model/models/huggingface_pipeline.py
which creates a service to log a HF model and upon successful logging, an inference service is created.

from snowflake.ml.model.models import huggingface_pipeline

hf_model_ref = huggingface_pipeline.HuggingFacePipelineModel(
  model="gpt2",
  task="text-generation", # Optional
)


hf_model_ref.create_service(
    session=session,
    service_name="test_service",
    service_compute_pool="test_compute_pool",
    image_repo="test_repo",
    ...
)

Experiment Tracking (PrPr): New module for managing and tracking ML experiments in Snowflake.

from snowflake.ml.experiment import ExperimentTracking

exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")

exp.set_experiment("MY_EXPERIMENT")

with exp.start_run():
  exp.log_param("batch_size", 32)
  exp.log_metrics("accuracy", 0.98, step=10)
  exp.log_model(my_model, model_name="MY_MODEL")

Registry: Added support for wide input (500+ features) for inference done using SPCS

Assets 5

25 Jun 21:29

snowflake-connectors-app

1.9.0

e54ed29

1.9.0

Bug Fixes

Registry: Fixed bug causing snowpark to pandas dataframe conversion to fail when QUOTED_IDENTIFIERS_IGNORE_CASE
parameter is enabled
Registry: Fixed duplicate UserWarning logs during model packaging

Behavior Changes

ML Job: The list_jobs() API has been modified. The scope parameter has been removed,
optional database and schema parameters have been added, the return type has changed
from snowpark.DataFrame to pandas.DataFrame, and the returned columns have been updated
to name, status, message, database_name, schema_name, owner, compute_pool,
target_instances, created_time, and completed_time.
Registry: Set relax_version to false when pip_requirements are specified while logging model
Registry: UserWarning will now be raised based on specified target_platforms (addresses spurious warnings)

New Features

Registry: target_platforms supports TargetPlatformMode: WAREHOUSE_ONLY, SNOWPARK_CONTAINER_SERVICES_ONLY,
or BOTH_WAREHOUSE_AND_SNOWPARK_CONTAINER_SERVICES.
Registry: Introduce snowflake.ml.model.target_platform.TargetPlatform, target platform constants, and
snowflake.ml.model.task.Task.
ML Job: Single-node ML Jobs are now in GA. Multi-node support is now in PuPr
- Moved less frequently used job submission parameters to **kwargs
- Platform metrics are now enabled by default
- list_jobs() behavior changed, see Behavior Changes for more info

Assets 5

18 Jun 21:12

snowflake-connectors-app

1.8.6

950a646

1.8.6

Bug Fixes

New Features

Registry: Add service container info to logs.

Assets 5

28 May 01:43

snowflake-connectors-app

1.8.5

66197a8

1.8.5

Bug Fixes

Registry: Fixed a bug when listing and deleting container services.
Registry: Fixed explainability issue with scikit-learn pipelines, skipping explain function creation.
Explainability: bump minimum streamlit version down to 1.30

Assets 5

12 May 21:17

snowflake-connectors-app

1.8.4

6910e96

1.8.4

Bug Fixes

Registry: Default enable_explainability to True when the model can be deployed to Warehouse.
Registry: Add custom_model.partitioned_api decorator and deprecate partitioned_inference_api.
Registry: Fixed a bug when logging pytroch and tensorflow models that caused
UnboundLocalError: local variable 'multiple_inputs' referenced before assignment.

Breaking change

ML Job: Updated property id to be fully qualified name; Introduced new property name to represent the ML Job name
ML Job: Modified list_jobs() to return ML Job name instead of id
Registry: Error in log_model if enable_explainability is True and model is only deployed to
Snowpark Container Services, instead of just user warning.

New Features

ML Job: Extend @remote function decorator, submit_file() and submit_directory() to accept database and
schema parameters
ML Job: Support querying by fully qualified name in get_job()
Explainability: Added visualization functions to snowflake.ml.monitoring to plot explanations in notebooks.
Explainability: Support explain for categorical transforms for sklearn pipeline
Support categorical type for xgboost.DMatrix inputs.

Assets 5

28 Apr 19:53

snowflake-connectors-app

1.8.3

5e30f0e

1.8.3

Bug Fixes

Behavior Change

New Features

Registry: Default to the runtime cuda version if available when logging a GPU model in Container Runtime.
ML Job: Added as_list argument to MLJob.get_logs() to enable retrieving logs
as a list of strings
Registry: Support ModelVersion.run_job to run inference with a single-node Snowpark Container Services job.
DataConnector: Removed PrPr decorators

Assets 5

15 Apr 20:59

snowflake-connectors-app

1.8.2

dde003f

1.8.2

Bug Fixes

Behavior Change

New Features

ML Job now available as a PuPr feature
ML Job: Add ability to retrieve results for @remote decorated functions using
new MLJobWithResult.result() API, which will return the unpickled result
or raise an exception if the job execution failed.
ML Job: Pre-created Snowpark Session is now available inside job payloads using
snowflake.snowpark.context.get_active_session()
Registry: Introducing save_location to log_model using the options argument.
User's can provide the path to write the model version's files that get stored in Snowflake's stage.
Registry: Include model dependencies in pip requirements by default when logging in Container Runtime.

reg.log_model(
    model=...,
    model_name=...,
    version_name=...,
    ...,
    options={"save_location": "./model_directory"},
)

ML Job (PrPr): Add instance_id argument to get_logs and show_logs method to support multi node log retrieval
ML Job (PrPr): Add job.get_instance_status(instance_id=...) API to support multi node status retrieval

Assets 5

26 Mar 21:19

snowflake-connectors-app

1.8.1

92ff883

1.8.1

Bug Fixes

Registry: Fix a bug that caused unsupported model type error while logging a sklearn model with score_samples
inference method.
Registry: Fix a bug that model inference service creation fails on an existing and suspended service.

Behavior Change

New Features

ML Job (PrPr): Update Container Runtime image version to 1.0.1
ML Job (PrPr): Add enable_metrics argument to job submission APIs to enable publishing service metrics to Event Table.
See Accessing Event Table service metrics
for retrieving published metrics
and Costs of telemetry data collection
for cost implications.
Registry: When creating a copy of a ModelVersion with log_model, raise an exception if unsupported arguments are provided.

Assets 5

20 Mar 18:33

snowflake-connectors-app

1.8.0

9709d06

1.8.0

Bug Fixes

Modeling: Fix a bug in some metrics that allowed an unsupported version of numpy to be installed
automatically in the stored procedure, resulting in a numpy error on execution
Registry: Fix a bug that leads to incorrect Model is does not have _is_inference_api error message when assigning
a supported model as a property of a CustomModel.
Registry: Fix a bug that inference is not working when models with more than 500 input features
are deployed to SPCS.

Behavior Change

Registry: With FeatureGroupSpec support, auto inferred model signature for transformers.Pipeline models have been
updated, including:

Signature for fill-mask task has been changed from

ModelSignature(
    inputs=[
        FeatureSpec(name="inputs", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="outputs", dtype=DataType.STRING),
    ],
)

ModelSignature(
    inputs=[
        FeatureSpec(name="inputs", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureGroupSpec(
            name="outputs",
            specs=[
                FeatureSpec(name="sequence", dtype=DataType.STRING),
                FeatureSpec(name="score", dtype=DataType.DOUBLE),
                FeatureSpec(name="token", dtype=DataType.INT64),
                FeatureSpec(name="token_str", dtype=DataType.STRING),
            ],
            shape=(-1,),
        ),
    ],
)

Signature for token-classification task has been changed from

ModelSignature(
    inputs=[
        FeatureSpec(name="inputs", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="outputs", dtype=DataType.STRING),
    ],
)

ModelSignature(
    inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)],
    outputs=[
        FeatureGroupSpec(
            name="outputs",
            specs=[
                FeatureSpec(name="word", dtype=DataType.STRING),
                FeatureSpec(name="score", dtype=DataType.DOUBLE),
                FeatureSpec(name="entity", dtype=DataType.STRING),
                FeatureSpec(name="index", dtype=DataType.INT64),
                FeatureSpec(name="start", dtype=DataType.INT64),
                FeatureSpec(name="end", dtype=DataType.INT64),
            ],
            shape=(-1,),
        ),
    ],
)

Signature for question-answering task when top_k is larger than 1 has been changed from

ModelSignature(
    inputs=[
        FeatureSpec(name="question", dtype=DataType.STRING),
        FeatureSpec(name="context", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="outputs", dtype=DataType.STRING),
    ],
)

ModelSignature(
    inputs=[
        FeatureSpec(name="question", dtype=DataType.STRING),
        FeatureSpec(name="context", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureGroupSpec(
            name="answers",
            specs=[
                FeatureSpec(name="score", dtype=DataType.DOUBLE),
                FeatureSpec(name="start", dtype=DataType.INT64),
                FeatureSpec(name="end", dtype=DataType.INT64),
                FeatureSpec(name="answer", dtype=DataType.STRING),
            ],
            shape=(-1,),
        ),
    ],
)

Signature for text-classification task when top_k is None has been changed from

ModelSignature(
    inputs=[
        FeatureSpec(name="text", dtype=DataType.STRING),
        FeatureSpec(name="text_pair", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="label", dtype=DataType.STRING),
        FeatureSpec(name="score", dtype=DataType.DOUBLE),
    ],
)

ModelSignature(
    inputs=[
        FeatureSpec(name="text", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="label", dtype=DataType.STRING),
        FeatureSpec(name="score", dtype=DataType.DOUBLE),
    ],
)

Signature for text-classification task when top_k is not None has been changed from

ModelSignature(
    inputs=[
        FeatureSpec(name="text", dtype=DataType.STRING),
        FeatureSpec(name="text_pair", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureSpec(name="outputs", dtype=DataType.STRING),
    ],
)

ModelSignature(
    inputs=[
        FeatureSpec(name="text", dtype=DataType.STRING),
    ],
    outputs=[
        FeatureGroupSpec(
            name="labels",
            specs=[
                FeatureSpec(name="label", dtype=DataType.STRING),
                FeatureSpec(name="score", dtype=DataType.DOUBLE),
            ],
            shape=(-1,),
        ),
    ],
)

Signature for text-generation task has been changed from

ModelSignature(
    inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)],
    outputs=[
        FeatureSpec(name="outputs", dtype=DataType.STRING),
    ],
)

ModelSignature(
    inputs=[
        FeatureGroupSpec(
            name="inputs",
            specs=[
                FeatureSpec(name="role", dtype=DataType.STRING),
                FeatureSpec(name="content", dtype=DataType.STRING),
            ],
            shape=(-1,),
        ),
    ],
    outputs=[
        FeatureGroupSpec(
            name="outputs",
            specs=[
                FeatureSpec(name="generated_text", dtype=DataType.STRING),
            ],
            shape=(-1,),
        )
    ],
)

Registry: PyTorch and TensorFlow models now expect a single tensor input/output by default when logging to Model
Registry. To use multiple tensors (previous behavior), set options={"multiple_inputs": True}.

Example with single tensor input:

import torch

class TorchModel(torch.nn.Module):
    def __init__(self, n_input: int, n_hidden: int, n_out: int, dtype: torch.dtype = torch.float32) -> None:
        super().__init__()
        self.model = torch.nn.Sequential(
            torch.nn.Linear(n_input, n_hidden, dtype=dtype),
            torch.nn.ReLU(),
            torch.nn.Linear(n_hidden, n_out, dtype=dtype),
            torch.nn.Sigmoid(),
        )

    def forward(self, tensor: torch.Tensor) -> torch.Tensor:
        return cast(torch.Tensor, self.model(tensor))

# Sample usage:
data_x = torch.rand(size=(batch_size, n_input))

# Log model with single tensor
reg.log_model(
    model=model,
    ...,
    sample_input_data=data_x
)

# Run inference with single tensor
mv.run(data_x)

For multiple tensor inputs/outputs, use:

reg.log_model(
    model=model,
    ...,
    sample_input_data=[data_x_1, data_x_2],
    options={"multiple_inputs": True}
)

Registry: Default enable_explainability to False when the model can be deployed to Snowpark Container Services.

New Features

Registry: Added support to single torch.Tensor, tensorflow.Tensor and tensorflow.Variable as input or output
data.
Registry: Support xgboost.DMatrix
datatype for XGBoost models.

Assets 5

Releases: snowflakedb/snowflake-ml-python

1.9.2

1.9.2

Bug Fixes

New Features

Uh oh!

1.9.1

1.9.1

Bug Fixes

New Features

Uh oh!

1.9.0

1.9.0

Bug Fixes

Behavior Changes

New Features

Uh oh!

1.8.6

1.8.6

Bug Fixes

New Features

Uh oh!

1.8.5

1.8.5

Bug Fixes

Uh oh!

1.8.4

1.8.4

Bug Fixes

Breaking change

New Features

Uh oh!

1.8.3

1.8.3

Bug Fixes

Behavior Change

New Features

Uh oh!

1.8.2

1.8.2

Bug Fixes

Behavior Change

New Features

Uh oh!

1.8.1

1.8.1

Bug Fixes

Behavior Change

New Features

Uh oh!

1.8.0

1.8.0

Bug Fixes

Behavior Change

New Features

Uh oh!