-
Notifications
You must be signed in to change notification settings - Fork 317
Description
Describe the bug or the issue that you are facing
Hi, I was trying to follow this repo to run all 4 pipelines in Github Actions and had to change the following things to get it working:
Update train.py:
parser = argparse.ArgumentParser("train")
parser.add_argument("--train_data", type=str, help="Path to train dataset")
parser.add_argument("--model_output", type=str, help="Path of output model")
# classifier specific arguments
parser.add_argument('--regressor__n_estimators', type=int, default=500,
help='Number of trees')
parser.add_argument('--regressor__bootstrap', type=bool, default=True,
help='Method of selecting samples for training each tree')
parser.add_argument('--regressor__max_depth', type=int, default=10,
help=' Maximum number of levels in tree')
parser.add_argument('--regressor__max_features', type=str, default='sqrt',
help='Number of features to consider at every split')
parser.add_argument('--regressor__min_samples_leaf', type=int, default=4,
help='Minimum number of samples required at each leaf node')
parser.add_argument('--regressor__min_samples_split', type=int, default=5,
help='Minimum number of samples required to split a node')
args = parser.parse_args()
return args
Update conda.yaml:
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.10
- pip
- pip:
- azureml-mlflow
- azureml-inference-server-http
- azure-ai-ml
- pyarrow
- scikit-learn
- pandas
- joblib
- matplotlib
- git+https://github.com/microsoft/AzureML-Observability#subdirectory=aml-obs-client
- git+https://github.com/microsoft/AzureML-Observability#subdirectory=aml-obs-collector
Update online_deployment.yaml:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
code_configuration:
code: .
scoring_script: score.py
endpoint_name: taxi-fare-online
environment: azureml:taxi-train-env@latest
model: azureml:taxi-model@latest
instance_type: Standard_DS3_v2
instance_count: 1
Update score.py:
import os
import logging
import json
import numpy
import joblib
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# Please provide your model's folder name if there is one
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "taxi_model/model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")
def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("model 1: request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()
Steps/Code to Reproduce
For the Terraform code, I also had to change some minor things:
For allocate-traffic.yml, connect-to-workspace.yml, create-compute.yml, create-deployment.yml, create-endpoint.yml, register-environment.yml and run-pipeline.yml, I had to change the Az CLI login as follows:
name: "Az CLI login"
uses: azure/login@v1
with:
# Azure login can use either Service Principal or OIDC authentication:
# 1. Service Principal: Uses client ID, tenant ID, client secret/certificate
# Example: creds: ${{secrets.creds}}
#
# 2. OIDC (OpenID Connect): More secure, uses federated identity credentials
# Example:
# client-id: ${{ secrets.AZURE_CLIENT_ID }}
# tenant-id: ${{ secrets.AZURE_TENANT_ID }}
# subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
# enable-oidc: true
#
# Choose the appropriate method based on your security requirements
#client-id: ${{ secrets.AZURE_CLIENT_ID }}
#tenant-id: ${{ secrets.AZURE_TENANT_ID }}
#subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
creds: ${{secrets.creds}}
# Uncomment next line to use OIDC
# enable-oidc: true
Since otherwise I got the error that the client id, tenant id and subscription id were not found. The terraform to run the pipeline and deploy the endpoints only pas the creds as secrets to these other yaml files.
Expected Output
/
Versions
I updated the conda to work with Python 3.10 and changed the Python code accordingly. Python 3.7 is often not supported anymore
Which platform are you using for deploying your infrastrucutre?
GitHub Actions (GitHub)
If you mentioned Others, please mention which platformm are you using?
No response
What are you using for deploying your infrastrucutre?
Terraform
Are you using Azure ML CLI v2 or Azure ML Python SDK v2
Azure ML CLI v2
Describe the example that you are trying to run?
Classic pipeline