Skip to content

Commit 9eec61f

Browse files
Project import generated by Copybara. (#31)
1 parent 091fb6c commit 9eec61f

File tree

150 files changed

+8551
-4620
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+8551
-4620
lines changed

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Release History
22

3+
## 1.0.4
4+
5+
### New Features
6+
- Model Registry: Added support save/load/deploy Tensorflow models (`tensorflow.Module`).
7+
- Model Registry: Added support save/load/deploy MLFlow PyFunc models (`mlflow.pyfunc.PyFuncModel`).
8+
- Model Development: Input dataframes can now be joined against data loaded from staged files.
9+
- Model Development: Added support for non-English languages.
10+
11+
### Bug Fixes
12+
13+
- Model Registry: Fix an issue that model dependencies are incorrectly reported as unresolvable on certain platforms.
14+
315
## 1.0.3 (2023-07-14)
416

517
### Behavior Changes

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Snowpark ML is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowpark ML, you can pre-process data, train, manage and deploy ML models all within Snowflake, using a single SDK, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.
44

55
## Key Components of Snowpark ML
6+
67
The Snowpark ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process, and includes two key components.
78

89
### Snowpark ML Development [Public Preview]
@@ -16,6 +17,7 @@ A collection of python APIs to enable efficient model development directly in Sn
1617
### Snowpark ML Ops [Private Preview]
1718

1819
Snowpark MLOps complements the Snowpark ML Development API, and provides model management capabilities along with integrated deployment into Snowflake. Currently, the API consists of
20+
1921
1. FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark Dataframe along with a number of convenience APIs.
2022

2123
1. Model Registry: A python API for managing models within Snowflake which also supports deployment of ML models into Snowflake Warehouses as vectorized UDFs.
@@ -25,15 +27,19 @@ During PrPr, we are iterating on API without backward compatibility guarantees.
2527
- [Documentation](https://docs.snowflake.com/developer-guide/snowpark-ml)
2628

2729
## Getting started
30+
2831
### Have your Snowflake account ready
32+
2933
If you don't have a Snowflake account yet, you can [sign up for a 30-day free trial account](https://signup.snowflake.com/).
3034

3135
### Create a Python virtual environment
32-
Python 3.8 is required. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html), [anaconda](https://www.anaconda.com/), or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a Python 3.8 virtual environment.
36+
37+
Python version 3.8, 3.9 & 3.10 are supported. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html), [anaconda](https://www.anaconda.com/), or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a virtual environment.
3338

3439
To have the best experience when using this library, [creating a local conda environment with the Snowflake channel](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#local-development-and-testing) is recommended.
3540

3641
### Install the library to the Python virtual environment
42+
3743
```
3844
pip install snowflake-ml-python
3945
```

bazel/get_affected_targets.sh

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,10 @@ help() {
2828
echo "Running ${PROG}"
2929

3030
bazel="bazel"
31-
current_revision=$(git rev-parse HEAD)
32-
pr_revision=${current_revision}
31+
current_revision=$(git symbolic-ref --short -q HEAD \
32+
|| git describe --tags --exact-match 2> /dev/null \
33+
|| git rev-parse --short HEAD)
34+
pr_revision=$(git rev-parse HEAD)
3335
output_path="/tmp/affected_targets/targets"
3436
workspace_path=$(pwd)
3537

bazel/mypy/CREDITS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Special thanks to [bazel-mypy-integration](https://github.com/bazel-contrib/bazel-mypy-integration).
2+
3+
This package has been forked from that repo and modified to cater specific need of this Snowflake repo.

bazel/mypy/mypy.bzl

Lines changed: 138 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,52 @@
1+
"Public API"
2+
13
load("@bazel_skylib//lib:shell.bzl", "shell")
24
load("@bazel_skylib//lib:sets.bzl", "sets")
5+
load("//bazel/mypy:rules.bzl", "MyPyStubsInfo")
36

47
MyPyAspectInfo = provider(
8+
"TODO: documentation",
59
fields = {
6-
"out": "mypy output.",
7-
"cache": "cache generated by mypy.",
10+
"exe": "Used to pass the rule implementation built exe back to calling aspect.",
11+
"out": "Used to pass the dummy output file back to calling aspect.",
812
},
913
)
1014

11-
# We don't support stubs (pyi) yet.
12-
PY_EXTENSIONS = ["py"]
13-
PY_RULES = ["py_binary", "py_library", "py_test", "py_wheel", "py_package"]
15+
# Switch to True only during debugging and development.
16+
# All releases should have this as False.
17+
DEBUG = False
18+
19+
VALID_EXTENSIONS = ["py", "pyi"]
1420

1521
DEFAULT_ATTRS = {
16-
"_mypy_sh": attr.label(
22+
"_template": attr.label(
1723
default = Label("//bazel/mypy:mypy.sh.tpl"),
1824
allow_single_file = True,
1925
),
20-
"_mypy": attr.label(
26+
"_mypy_cli": attr.label(
2127
default = Label("//bazel/mypy:mypy"),
2228
executable = True,
23-
cfg = "host",
29+
cfg = "exec",
2430
),
2531
"_mypy_config": attr.label(
2632
default = Label("//:mypy.ini"),
2733
allow_single_file = True,
2834
),
29-
"_debug": attr.bool(
30-
default = False,
31-
)
3235
}
3336

34-
# See https://github.com/python/mypy/pull/4759 for what `cache_map_triples` mean.
35-
def _sources_to_cache_map_triples(cache_files, dep_cache_files):
37+
def _sources_to_cache_map_triples(srcs):
3638
triples_as_flat_list = []
37-
for d in (cache_files, dep_cache_files):
38-
for src, (meta, data) in d.items():
39-
triples_as_flat_list.extend([
40-
shell.quote(src.path),
41-
shell.quote(meta.path),
42-
shell.quote(data.path),
43-
])
39+
for f in srcs:
40+
f_path = f.path
41+
triples_as_flat_list.extend([
42+
shell.quote(f_path),
43+
shell.quote("{}.meta.json".format(f_path)),
44+
shell.quote("{}.data.json".format(f_path)),
45+
])
4446
return triples_as_flat_list
4547

46-
def _flatten_cache_dict(cache_files):
47-
result = []
48-
for meta, data in cache_files.values():
49-
result.append(meta)
50-
result.append(data)
51-
return result
48+
def _is_external_dep(dep):
49+
return dep.label.workspace_root.startswith("external/")
5250

5351
def _is_external_src(src_file):
5452
return src_file.path.startswith("external/")
@@ -57,127 +55,142 @@ def _extract_srcs(srcs):
5755
direct_src_files = []
5856
for src in srcs:
5957
for f in src.files.to_list():
60-
if f.extension in PY_EXTENSIONS and not _is_external_src(f):
58+
if f.extension in VALID_EXTENSIONS:
6159
direct_src_files.append(f)
6260
return direct_src_files
6361

64-
# Overview
65-
# This aspect does the following:
66-
# - Create an action to run mypy against the sources of `target`
67-
# - input of this action:
68-
# - source files of `target` and source files of all its deps.
69-
# - cache files produced by checking its deps.
70-
# - output of this action:
71-
# - mypy stderr+stdout in a file
72-
# - cache files produced by checking the source files of `target`
73-
# - this action depends on actions created for the deps, so that it always
74-
# has access to cache files produced by those actions.
75-
# - Propagate the output of this action along the `deps` edge of the build graph.
76-
# - Produces a OutputGroup which contains the output of all the actions created
77-
# along the build graph so that one can use bazel commandline to mark all those
78-
# actions as required and to make them run.
79-
def _mypy_aspect_impl(target, ctx):
80-
if (ctx.rule.kind not in PY_RULES or
81-
ctx.label.workspace_root.startswith("external")):
82-
return []
62+
def _extract_transitive_deps(deps):
63+
transitive_deps = []
64+
for dep in deps:
65+
if MyPyStubsInfo not in dep and PyInfo in dep and not _is_external_dep(dep):
66+
transitive_deps.append(dep[PyInfo].transitive_sources)
67+
return transitive_deps
68+
69+
def _extract_stub_deps(deps):
70+
# Need to add the .py files AND the .pyi files that are
71+
# deps of the rule
72+
stub_files = []
73+
for dep in deps:
74+
if MyPyStubsInfo in dep:
75+
for stub_srcs_target in dep[MyPyStubsInfo].srcs:
76+
for src_f in stub_srcs_target.files.to_list():
77+
if src_f.extension == "pyi":
78+
stub_files.append(src_f)
79+
return stub_files
80+
81+
def _extract_imports(imports, label):
82+
# NOTE: Bazel's implementation of this for py_binary, py_test is at
83+
# src/main/java/com/google/devtools/build/lib/bazel/rules/python/BazelPythonSemantics.java
84+
mypypath_parts = []
85+
for import_ in imports:
86+
if import_.startswith("/"):
87+
# buildifier: disable=print
88+
print("ignoring invalid absolute path '{}'".format(import_))
89+
elif import_ in ["", "."]:
90+
mypypath_parts.append(label.package)
91+
else:
92+
mypypath_parts.append("{}/{}".format(label.package, import_))
93+
return mypypath_parts
94+
95+
def _mypy_rule_impl(ctx):
8396
base_rule = ctx.rule
84-
debug = ctx.attr._debug
85-
mypy_config_file = ctx.file._mypy_config
8697

87-
# Get the cache files generated by running mypy against the deps.
88-
dep_cache_files = {}
89-
for dep in ctx.rule.attr.deps:
90-
if MyPyAspectInfo in dep:
91-
dep_cache_files.update(dep[MyPyAspectInfo].cache)
98+
mypy_config_file = ctx.file._mypy_config
9299

100+
mypypath_parts = []
93101
direct_src_files = []
102+
transitive_srcs_depsets = []
103+
stub_files = []
104+
94105
if hasattr(base_rule.attr, "srcs"):
95106
direct_src_files = _extract_srcs(base_rule.attr.srcs)
96107

97-
# It's possible that this target does not have srcs (py_wheel for example).
98-
# However, if the user requests to type check a py_wheel, we should make sure
99-
# its python transitive deps get checked.
100-
if direct_src_files:
101-
# There are source files in this target to check. The check will result in
102-
# cache files. Request bazel to allocate those files now.
103-
cache_files = {}
104-
for src in direct_src_files:
105-
meta_file = ctx.actions.declare_file("{}.meta.json".format(src.basename))
106-
data_file = ctx.actions.declare_file("{}.data.json".format(src.basename))
107-
cache_files[src] = (meta_file, data_file)
108-
109-
110-
# The mypy stdout, which is expected to be produced by mypy_script.
111-
mypy_out = ctx.actions.declare_file("%s_mypy_out" % ctx.rule.attr.name)
112-
# The script to invoke mypy against this target.
113-
mypy_script = ctx.actions.declare_file(
114-
"%s_mypy_script" % ctx.rule.attr.name,
115-
)
116-
117-
# Generated files are located in a different root dir than source files
118-
# Thus we need to let mypy know where to find both kinds in case in one analysis
119-
# both kinds are present.
120-
src_root_paths = sets.to_list(
121-
sets.make(
122-
[f.root.path for f in dep_cache_files.keys()] +
123-
[f.root.path for f in cache_files.keys()]),
124-
)
125-
126-
all_src_files = direct_src_files + list(dep_cache_files.keys())
108+
if hasattr(base_rule.attr, "deps"):
109+
transitive_srcs_depsets = _extract_transitive_deps(base_rule.attr.deps)
110+
stub_files = _extract_stub_deps(base_rule.attr.deps)
111+
112+
if hasattr(base_rule.attr, "imports"):
113+
mypypath_parts = _extract_imports(base_rule.attr.imports, ctx.label)
114+
115+
final_srcs_depset = depset(transitive = transitive_srcs_depsets +
116+
[depset(direct = direct_src_files)])
117+
src_files = [f for f in final_srcs_depset.to_list() if not _is_external_src(f)]
118+
if not src_files:
119+
return None
120+
121+
mypypath_parts += [src_f.dirname for src_f in stub_files]
122+
mypypath = ":".join(mypypath_parts)
123+
124+
out = ctx.actions.declare_file("%s_dummy_out" % ctx.rule.attr.name)
125+
exe = ctx.actions.declare_file(
126+
"%s_mypy_exe" % ctx.rule.attr.name,
127+
)
128+
129+
# Compose a list of the files needed for use. Note that aspect rules can use
130+
# the project version of mypy however, other rules should fall back on their
131+
# relative runfiles.
132+
runfiles = ctx.runfiles(files = src_files + stub_files + [mypy_config_file])
133+
134+
src_root_paths = sets.to_list(
135+
sets.make([f.root.path for f in src_files]),
136+
)
137+
138+
ctx.actions.expand_template(
139+
template = ctx.file._template,
140+
output = exe,
127141
substitutions = {
128-
"{MYPY_BIN}": ctx.executable._mypy.path,
129-
"{CACHE_MAP_TRIPLES}": " ".join(_sources_to_cache_map_triples(cache_files, dep_cache_files)),
142+
"{MYPY_EXE}": ctx.executable._mypy_cli.path,
143+
"{MYPY_ROOT}": ctx.executable._mypy_cli.root.path,
144+
"{CACHE_MAP_TRIPLES}": " ".join(_sources_to_cache_map_triples(src_files)),
130145
"{PACKAGE_ROOTS}": " ".join([
131146
"--package-root " + shell.quote(path or ".")
132147
for path in src_root_paths
133148
]),
134149
"{SRCS}": " ".join([
135150
shell.quote(f.path)
136-
for f in all_src_files
151+
for f in src_files
137152
]),
138-
"{VERBOSE_OPT}": "--verbose" if debug else "",
139-
"{VERBOSE_BASH}": "set -x" if debug else "",
140-
"{OUTPUT}": mypy_out.path,
141-
"{ADDITIONAL_MYPYPATH}": ":".join([p for p in src_root_paths if p]),
142-
"{MYPY_INI}": mypy_config_file.path,
143-
}
144-
ctx.actions.expand_template(
145-
template = ctx.file._mypy_sh,
146-
output = mypy_script,
147-
substitutions = substitutions,
148-
is_executable = True,
149-
)
150-
151-
# We want mypy to follow imports, so all the source files of the dependencies
152-
# are need altoghther to check this target.
153-
ctx.actions.run(
154-
outputs = [mypy_out] + _flatten_cache_dict(cache_files),
155-
inputs = depset(
156-
all_src_files +
157-
[mypy_config_file] +
158-
_flatten_cache_dict(dep_cache_files) # cache generated by analyzing deps
159-
),
160-
tools = [ctx.executable._mypy],
161-
executable = mypy_script,
162-
mnemonic = "MyPy",
163-
progress_message = "Type-checking %s" % ctx.label,
164-
use_default_shell_env = True,
165-
)
166-
dep_cache_files.update(cache_files)
167-
transitive_mypy_outs = []
168-
for dep in ctx.rule.attr.deps:
169-
if OutputGroupInfo in dep:
170-
if hasattr(dep[OutputGroupInfo], "mypy"):
171-
transitive_mypy_outs.append(dep[OutputGroupInfo].mypy)
153+
"{VERBOSE_OPT}": "--verbose" if DEBUG else "",
154+
"{VERBOSE_BASH}": "set -x" if DEBUG else "",
155+
"{OUTPUT}": out.path if out else "",
156+
"{MYPYPATH_PATH}": mypypath if mypypath else "",
157+
"{MYPY_INI_PATH}": mypy_config_file.path,
158+
},
159+
is_executable = True,
160+
)
161+
162+
return [
163+
DefaultInfo(executable = exe, runfiles = runfiles),
164+
MyPyAspectInfo(exe = exe, out = out),
165+
]
172166

167+
def _mypy_aspect_impl(_, ctx):
168+
if (ctx.rule.kind not in ["py_binary", "py_library", "py_test", "mypy_test"] or
169+
ctx.label.workspace_root.startswith("external")):
170+
return []
171+
172+
providers = _mypy_rule_impl(
173+
ctx
174+
)
175+
if not providers:
176+
return []
177+
178+
info = providers[0]
179+
aspect_info = providers[1]
180+
181+
ctx.actions.run(
182+
outputs = [aspect_info.out],
183+
inputs = info.default_runfiles.files,
184+
tools = [ctx.executable._mypy_cli],
185+
executable = aspect_info.exe,
186+
mnemonic = "MyPy",
187+
progress_message = "Type-checking %s" % ctx.label,
188+
use_default_shell_env = True,
189+
)
173190
return [
174191
OutputGroupInfo(
175-
# We may not need to run mypy against this target, but we request
176-
# all its dependencies to be checked, recursively, but demanding the output
177-
# of those checks.
178-
mypy = depset([mypy_out] if direct_src_files else [], transitive=transitive_mypy_outs),
192+
mypy = depset([aspect_info.out]),
179193
),
180-
MyPyAspectInfo(out = mypy_out if direct_src_files else None, cache = dep_cache_files),
181194
]
182195

183196
mypy_aspect = aspect(

0 commit comments

Comments
 (0)