Skip to content

Commit e74123e

Browse files
authored
Merge pull request #532 from graphistry/dev/cugfql
Dev/cugfql
2 parents 7f0d32f + 9a9de51 commit e74123e

27 files changed

+5376
-100
lines changed

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,22 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
77

88
## [Development]
99

10+
## [0.33.0 - 2023-12-26]
11+
12+
### Added
13+
14+
* GFQL: GPU acceleration of `chain`, `hop`, `filter_by_dict`
15+
* `AbstractEngine` to `engine.py::Engine` enum
16+
* `compute.typing.DataFrameT` to centralize df-lib-agnostic type checking
17+
18+
### Refactor
19+
20+
* GFQL and more of compute uses generic dataframe methods and threads through engine
21+
22+
### Infra
23+
24+
* GPU tester threads through LOG_LEVEL
25+
1026
## [0.32.0 - 2023-12-22]
1127

1228
### Added

README.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@
1111
[![Uptime Robot status](https://img.shields.io/uptimerobot/status/m787548531-e9c7b7508fc76fea927e2313?label=hub.graphistry.com)](https://status.graphistry.com/) [<img src="https://img.shields.io/badge/slack-Graphistry%20chat-orange.svg?logo=slack">](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)
1212
[![Twitter Follow](https://img.shields.io/twitter/follow/graphistry)](https://twitter.com/graphistry)
1313

14-
**PyGraphistry is a Python visual graph AI library to extract, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** Installing with optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your `time to graph` for going from raw data to visualizations and AI models down to three lines of code.
14+
**PyGraphistry is a dataframe-native Python visual graph AI library to extract, query, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** The GFQL query language supports running a large subset of the Cypher property graph query language without requiring external software and adds optional GPU acceleration. Installing PyGraphistry with the optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your **time to graph** for going from raw data to visualizations and AI models down to three lines of code.
1515

16-
Graphistry gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.
16+
The optional visual engine, Graphistry, gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.
1717

1818
The PyGraphistry Python client helps several kinds of usage modes:
1919

@@ -147,14 +147,14 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
147147
g2.plot()
148148
```
149149

150-
* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb))
150+
* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb))
151151

152152
Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL:
153153

154154
```python
155155
from graphistry import n, e_undirected, is_in
156156

157-
g2 = g.chain([
157+
g2 = g1.chain([
158158
n({'user': 'Biden'}),
159159
e_undirected(),
160160
n(name='bridge'),
@@ -166,6 +166,17 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
166166
g2.plot()
167167
```
168168

169+
Enable GFQL's optional automatic GPU acceleration for 43X+ speedups:
170+
171+
```python
172+
# Switch from Pandas CPU dataframes to RAPIDS GPU dataframes
173+
import cudf
174+
g2 = g1.edges(lambda g: cudf.DataFrame(g._edges))
175+
# GFQL will automaticallly run on a GPU
176+
g3 = g2.chain([n(), e(hops=3), n()])
177+
g3.plot()
178+
```
179+
169180
* [Spark](https://spark.apache.org/)/[Databricks](https://databricks.com/) ([ipynb demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.ipynb), [dbc demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.dbc))
170181

171182
```python
@@ -1163,6 +1174,8 @@ Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe
11631174
* numeric: gt, lt, ge, le, eq, ne, between, isna, notna
11641175
* string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull
11651176

1177+
Both `hop()` and `chain()` will run on GPUs when passing in RAPIDS dataframes. Specify parameter `engine='cudf'` to be sure.
1178+
11661179
#### Table to graph
11671180

11681181
```python
@@ -1235,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns
12351248

12361249
PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java
12371250

1238-
See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)
1251+
See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)
12391252

12401253
Traverse within a graph, or expand one graph against another
12411254

@@ -1327,6 +1340,17 @@ pattern2 = Chain.from_json(pattern_json)
13271340
g.chain(pattern2).plot()
13281341
```
13291342

1343+
Benefit from automatic GPU acceleration by passing in GPU dataframes:
1344+
1345+
```python
1346+
import cudf
1347+
1348+
g1 = graphistry.edges(cudf.read_csv('data.csv'), 's', 'd')
1349+
g2 = g1.chain(..., engine='cudf')
1350+
```
1351+
1352+
The parameter `engine` is optional, defaulting to `'auto'`.
1353+
13301354
#### Pipelining
13311355

13321356
```python

0 commit comments

Comments
 (0)