Releases: snowflakedb/snowflake-ml-python
1.9.2
1.9.2
Bug Fixes
- DataConnector: Fix
self._sessionrelated errors inside Container Runtime. - Registry: Fix a bug when trying to pass
Noneto array (pd.dtype('O')) in signature and pandas data handler.
New Features
- Experiment Tracking (PrPr): Automatically log the model, metrics, and parameters while training
XGBoost and LightGBM models.
from snowflake.ml.experiment import ExperimentTracking
from snowflake.ml.experiment.callback import SnowflakeXgboostCallback, SnowflakeLightgbmCallback
exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")
exp.set_experiment("MY_EXPERIMENT")
# XGBoost
callback = SnowflakeXgboostCallback(
exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = XGBClassifier(callbacks=[callback])
with exp.start_run():
model.fit(X, y, eval_set=[(X_test, y_test)])
# LightGBM
callback = SnowflakeLightgbmCallback(
exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = LGBMClassifier()
with exp.start_run():
model.fit(X, y, eval_set=[(X_test, y_test)], callbacks=[callback])1.9.1
1.9.1
Bug Fixes
- Registry: Fix a bug when trying to set the PAD token the HuggingFace
text-generationmodel had multiple EOS tokens.
The handler picks the first EOS token as PAD token now.
New Features
- DataConnector: DataConnector objects can now be pickled
- Dataset: Dataset objects can now be pickled
- Registry (PrPr): Introducing
create_servicefunction insnowflake/ml/model/models/huggingface_pipeline.py
which creates a service to log a HF model and upon successful logging, an inference service is created.
from snowflake.ml.model.models import huggingface_pipeline
hf_model_ref = huggingface_pipeline.HuggingFacePipelineModel(
model="gpt2",
task="text-generation", # Optional
)
hf_model_ref.create_service(
session=session,
service_name="test_service",
service_compute_pool="test_compute_pool",
image_repo="test_repo",
...
)- Experiment Tracking (PrPr): New module for managing and tracking ML experiments in Snowflake.
from snowflake.ml.experiment import ExperimentTracking
exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")
exp.set_experiment("MY_EXPERIMENT")
with exp.start_run():
exp.log_param("batch_size", 32)
exp.log_metrics("accuracy", 0.98, step=10)
exp.log_model(my_model, model_name="MY_MODEL")- Registry: Added support for wide input (500+ features) for inference done using SPCS
1.9.0
1.9.0
Bug Fixes
- Registry: Fixed bug causing snowpark to pandas dataframe conversion to fail when
QUOTED_IDENTIFIERS_IGNORE_CASE
parameter is enabled - Registry: Fixed duplicate UserWarning logs during model packaging
Behavior Changes
- ML Job: The
list_jobs()API has been modified. Thescopeparameter has been removed,
optionaldatabaseandschemaparameters have been added, the return type has changed
fromsnowpark.DataFrametopandas.DataFrame, and the returned columns have been updated
toname,status,message,database_name,schema_name,owner,compute_pool,
target_instances,created_time, andcompleted_time. - Registry: Set
relax_versionto false when pip_requirements are specified while logging model - Registry: UserWarning will now be raised based on specified target_platforms (addresses spurious warnings)
New Features
- Registry:
target_platformssupportsTargetPlatformMode:WAREHOUSE_ONLY,SNOWPARK_CONTAINER_SERVICES_ONLY,
orBOTH_WAREHOUSE_AND_SNOWPARK_CONTAINER_SERVICES. - Registry: Introduce
snowflake.ml.model.target_platform.TargetPlatform, target platform constants, and
snowflake.ml.model.task.Task. - ML Job: Single-node ML Jobs are now in GA. Multi-node support is now in PuPr
- Moved less frequently used job submission parameters to
**kwargs - Platform metrics are now enabled by default
list_jobs()behavior changed, see Behavior Changes for more info
- Moved less frequently used job submission parameters to
1.8.6
1.8.6
Bug Fixes
New Features
- Registry: Add service container info to logs.
1.8.5
1.8.5
Bug Fixes
- Registry: Fixed a bug when listing and deleting container services.
- Registry: Fixed explainability issue with scikit-learn pipelines, skipping explain function creation.
- Explainability: bump minimum streamlit version down to 1.30
1.8.4
1.8.4
Bug Fixes
- Registry: Default
enable_explainabilityto True when the model can be deployed to Warehouse. - Registry: Add
custom_model.partitioned_apidecorator and deprecatepartitioned_inference_api. - Registry: Fixed a bug when logging pytroch and tensorflow models that caused
UnboundLocalError: local variable 'multiple_inputs' referenced before assignment.
Breaking change
- ML Job: Updated property
idto be fully qualified name; Introduced new propertynameto represent the ML Job name - ML Job: Modified
list_jobs()to return ML Jobnameinstead ofid - Registry: Error in
log_modelifenable_explainabilityis True and model is only deployed to
Snowpark Container Services, instead of just user warning.
New Features
- ML Job: Extend
@remotefunction decorator,submit_file()andsubmit_directory()to acceptdatabaseand
schemaparameters - ML Job: Support querying by fully qualified name in
get_job() - Explainability: Added visualization functions to
snowflake.ml.monitoringto plot explanations in notebooks. - Explainability: Support explain for categorical transforms for sklearn pipeline
- Support categorical type for
xgboost.DMatrixinputs.
1.8.3
1.8.3
Bug Fixes
Behavior Change
New Features
- Registry: Default to the runtime cuda version if available when logging a GPU model in Container Runtime.
- ML Job: Added
as_listargument toMLJob.get_logs()to enable retrieving logs
as a list of strings - Registry: Support
ModelVersion.run_jobto run inference with a single-node Snowpark Container Services job. - DataConnector: Removed PrPr decorators
1.8.2
1.8.2
Bug Fixes
Behavior Change
New Features
- ML Job now available as a PuPr feature
- ML Job: Add ability to retrieve results for
@remotedecorated functions using
newMLJobWithResult.result()API, which will return the unpickled result
or raise an exception if the job execution failed. - ML Job: Pre-created Snowpark Session is now available inside job payloads using
snowflake.snowpark.context.get_active_session() - Registry: Introducing
save_locationtolog_modelusing theoptionsargument.
User's can provide the path to write the model version's files that get stored in Snowflake's stage. - Registry: Include model dependencies in pip requirements by default when logging in Container Runtime.
reg.log_model(
model=...,
model_name=...,
version_name=...,
...,
options={"save_location": "./model_directory"},
)- ML Job (PrPr): Add
instance_idargument toget_logsandshow_logsmethod to support multi node log retrieval - ML Job (PrPr): Add
job.get_instance_status(instance_id=...)API to support multi node status retrieval
1.8.1
1.8.1
Bug Fixes
- Registry: Fix a bug that caused
unsupported model typeerror while logging a sklearn model withscore_samples
inference method. - Registry: Fix a bug that model inference service creation fails on an existing and suspended service.
Behavior Change
New Features
- ML Job (PrPr): Update Container Runtime image version to
1.0.1 - ML Job (PrPr): Add
enable_metricsargument to job submission APIs to enable publishing service metrics to Event Table.
See Accessing Event Table service metrics
for retrieving published metrics
and Costs of telemetry data collection
for cost implications. - Registry: When creating a copy of a
ModelVersionwithlog_model, raise an exception if unsupported arguments are provided.
1.8.0
1.8.0
Bug Fixes
- Modeling: Fix a bug in some metrics that allowed an unsupported version of numpy to be installed
automatically in the stored procedure, resulting in a numpy error on execution - Registry: Fix a bug that leads to incorrect
Model is does not have _is_inference_apierror message when assigning
a supported model as a property of a CustomModel. - Registry: Fix a bug that inference is not working when models with more than 500 input features
are deployed to SPCS.
Behavior Change
-
Registry: With FeatureGroupSpec support, auto inferred model signature for
transformers.Pipelinemodels have been
updated, including:-
Signature for fill-mask task has been changed from
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )
to
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="sequence", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="token", dtype=DataType.INT64), FeatureSpec(name="token_str", dtype=DataType.STRING), ], shape=(-1,), ), ], )
-
Signature for token-classification task has been changed from
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )
to
ModelSignature( inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="word", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="entity", dtype=DataType.STRING), FeatureSpec(name="index", dtype=DataType.INT64), FeatureSpec(name="start", dtype=DataType.INT64), FeatureSpec(name="end", dtype=DataType.INT64), ], shape=(-1,), ), ], )
-
Signature for question-answering task when top_k is larger than 1 has been changed from
ModelSignature( inputs=[ FeatureSpec(name="question", dtype=DataType.STRING), FeatureSpec(name="context", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )
to
ModelSignature( inputs=[ FeatureSpec(name="question", dtype=DataType.STRING), FeatureSpec(name="context", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="answers", specs=[ FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="start", dtype=DataType.INT64), FeatureSpec(name="end", dtype=DataType.INT64), FeatureSpec(name="answer", dtype=DataType.STRING), ], shape=(-1,), ), ], )
-
Signature for text-classification task when top_k is
Nonehas been changed fromModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), FeatureSpec(name="text_pair", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], )
to
ModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], )
-
Signature for text-classification task when top_k is not
Nonehas been changed fromModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), FeatureSpec(name="text_pair", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )
to
ModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="labels", specs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], shape=(-1,), ), ], )
-
Signature for text-generation task has been changed from
ModelSignature( inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )
to
ModelSignature( inputs=[ FeatureGroupSpec( name="inputs", specs=[ FeatureSpec(name="role", dtype=DataType.STRING), FeatureSpec(name="content", dtype=DataType.STRING), ], shape=(-1,), ), ], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="generated_text", dtype=DataType.STRING), ], shape=(-1,), ) ], )
-
-
Registry: PyTorch and TensorFlow models now expect a single tensor input/output by default when logging to Model
Registry. To use multiple tensors (previous behavior), setoptions={"multiple_inputs": True}.Example with single tensor input:
import torch class TorchModel(torch.nn.Module): def __init__(self, n_input: int, n_hidden: int, n_out: int, dtype: torch.dtype = torch.float32) -> None: super().__init__() self.model = torch.nn.Sequential( torch.nn.Linear(n_input, n_hidden, dtype=dtype), torch.nn.ReLU(), torch.nn.Linear(n_hidden, n_out, dtype=dtype), torch.nn.Sigmoid(), ) def forward(self, tensor: torch.Tensor) -> torch.Tensor: return cast(torch.Tensor, self.model(tensor)) # Sample usage: data_x = torch.rand(size=(batch_size, n_input)) # Log model with single tensor reg.log_model( model=model, ..., sample_input_data=data_x ) # Run inference with single tensor mv.run(data_x)
For multiple tensor inputs/outputs, use:
reg.log_model( model=model, ..., sample_input_data=[data_x_1, data_x_2], options={"multiple_inputs": True} )
-
Registry: Default
enable_explainabilityto False when the model can be deployed to Snowpark Container Services.
New Features
- Registry: Added support to single
torch.Tensor,tensorflow.Tensorandtensorflow.Variableas input or output
data. - Registry: Support
xgboost.DMatrix
datatype for XGBoost models.