diff --git a/README.md b/README.md
index a50e260f76..afc7b6c832 100644
--- a/README.md
+++ b/README.md
@@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
g2.plot()
```
-* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](https://github.com/graphistry/pygraphistry/blob/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](https://github.com/graphistry/pygraphistry/blob/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb))
+* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb))
Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL:
@@ -1250,7 +1250,7 @@ assert 'pagerank' in g2._nodes.columns
PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java
-See also [graph pattern matching tutorial](https://github.com/graphistry/pygraphistry/tree/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](https://github.com/graphistry/pygraphistry/tree/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb)
+See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)
Traverse within a graph, or expand one graph against another
diff --git a/demos/gfql/benchmark_hops_cpu_gpu.ipynb b/demos/gfql/benchmark_hops_cpu_gpu.ipynb
index bf17b630e7..869344af4c 100644
--- a/demos/gfql/benchmark_hops_cpu_gpu.ipynb
+++ b/demos/gfql/benchmark_hops_cpu_gpu.ipynb
@@ -1,27 +1,14 @@
{
- "nbformat": 4,
- "nbformat_minor": 0,
- "metadata": {
- "colab": {
- "provenance": [],
- "gpuType": "T4"
- },
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3"
- },
- "language_info": {
- "name": "python"
- },
- "accelerator": "GPU"
- },
"cells": [
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "GZxoiU8sQDk_"
+ },
"source": [
"# GFQL CPU, GPU Benchmark\n",
"\n",
- "This notebook examines GFQL progerty graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n",
+ "This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n",
"\n",
"The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n",
"\n",
@@ -30,10 +17,10 @@
"\n",
"| Network | Nodes | Edges |\n",
"|-------------|-----------|--------------|\n",
- "| **Facebook**| 4,039 | 88,234 |\n",
- "| **Twitter** | 81,306 | 2,420,766 |\n",
- "| **GPlus** | 107,614 | 30,494,866 |\n",
- "| **Orkut** | 3,072,441 | 117,185,082 |\n",
+ "| [**Facebook**](#fb)| 4,039 | 88,234 |\n",
+ "| [**Twitter**](#tw) | 81,306 | 2,420,766 |\n",
+ "| [**GPlus**](#gpl) | 107,614 | 30,494,866 |\n",
+ "| [**Orkut**](#ork) | 3,072,441 | 117,185,082 |\n",
"\n",
"## Results\n",
"\n",
@@ -52,10 +39,10 @@
"\n",
"| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n",
"|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n",
- "| **Facebook**| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n",
- "| **Twitter** | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n",
- "| **GPlus** | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n",
- "| **Orkut** | N/A | N/A | 12.15 | N/A | 208.3 |\n",
+ "| [**Facebook**](#fb)| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n",
+ "| [**Twitter**](#tw) | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n",
+ "| [**GPlus**](#gpl) | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n",
+ "| [**Orkut**](#ork) | N/A | N/A | 12.15 | N/A | 208.3 |\n",
"| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n",
"| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n",
"\n",
@@ -67,53 +54,46 @@
"\n",
"| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n",
"|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n",
- "| **Facebook**| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n",
- "| **Twitter** | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n",
- "| **GPlus** | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n",
- "| **Orkut** | N/A | N/A | 41.50 | N/A | 711.4 |\n",
+ "| [**Facebook**](#fb)| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n",
+ "| [**Twitter**](#tw) | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n",
+ "| [**GPlus**](#gpl) | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n",
+ "| [**Orkut**](#ork) | N/A | N/A | 41.50 | N/A | 711.4 |\n",
"| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n",
"| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n"
- ],
- "metadata": {
- "id": "GZxoiU8sQDk_"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "## Optional: GPU setup - Google Colab"
- ],
"metadata": {
"id": "SAj8lhREEOwS"
- }
+ },
+ "source": [
+ "## Optional: GPU setup - Google Colab"
+ ]
},
{
"cell_type": "markdown",
- "source": [],
"metadata": {
"id": "4hrEEAAm7DTO"
- }
+ },
+ "source": []
},
{
"cell_type": "code",
- "source": [
- "# Report GPU used when GPU benchmarking\n",
- "! nvidia-smi"
- ],
+ "execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W2MF6ZsjDv3B",
- "outputId": "46088cbc-2db9-4529-f724-dc57ed85dfb7"
+ "outputId": "ad2ab798-617d-49db-e379-5670debe4951"
},
- "execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "Tue Dec 26 00:50:30 2023 \n",
+ "Tue Jul 9 13:29:05 2024 \n",
"+---------------------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
"|-----------------------------------------+----------------------+----------------------+\n",
@@ -122,7 +102,7 @@
"| | | MIG M. |\n",
"|=========================================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 54C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n",
+ "| N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n",
"| | | N/A |\n",
"+-----------------------------------------+----------------------+----------------------+\n",
" \n",
@@ -135,198 +115,176 @@
"+---------------------------------------------------------------------------------------+\n"
]
}
- ]
- },
- {
- "cell_type": "code",
- "source": [
- "# if in google colab\n",
- "!git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n",
- "!python rapidsai-csp-utils/colab/pip-install.py"
],
- "metadata": {
- "id": "Aikh0x4ID_wK"
- },
- "execution_count": 8,
- "outputs": []
+ "source": [
+ "# Report GPU used when GPU benchmarking\n",
+ "! nvidia-smi"
+ ]
},
{
"cell_type": "code",
- "source": [
- "import cudf\n",
- "cudf.__version__"
- ],
+ "execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "Lwekdei1dH3N",
- "outputId": "71f5b01d-7917-4283-8338-969167d6e1e8"
+ "outputId": "51562461-432e-4b8d-f697-0a6b559ac8b0"
},
- "execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
- "'23.12.01'"
+ "'24.04.01'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
- "execution_count": 3
+ "execution_count": 2
}
+ ],
+ "source": [
+ "import cudf\n",
+ "cudf.__version__"
]
},
{
"cell_type": "markdown",
- "source": [
- "# 1. Install & configure"
- ],
"metadata": {
"id": "QQpsrtwBT7sa"
- }
+ },
+ "source": [
+ "# 1. Install & configure"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#! pip install graphistry[igraph]\n",
- "\n",
- "!pip install -q igraph\n",
- "#!pip install -q git+https://github.com/graphistry/pygraphistry.git@dev/cugfql\n",
- "!pip install -q graphistry\n"
- ],
+ "execution_count": 3,
"metadata": {
- "id": "cYjRbgkU9Sx8",
"colab": {
"base_uri": "https://localhost:8080/"
},
- "outputId": "2cf25531-9b8b-4715-ccc7-e79094d84ebd"
+ "id": "cYjRbgkU9Sx8",
+ "outputId": "c8e454a2-e537-467e-afc6-830c51ad869c"
},
- "execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m250.5/250.5 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m332.3/332.3 kB\u001b[0m \u001b[31m9.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h"
]
}
+ ],
+ "source": [
+ "#! pip install graphistry[igraph]\n",
+ "\n",
+ "!pip install -q igraph\n",
+ "!pip install -q graphistry\n"
]
},
{
"cell_type": "markdown",
- "source": [
- "## Imports"
- ],
"metadata": {
"id": "Ff6Tt9DhkePl"
- }
+ },
+ "source": [
+ "## Imports"
+ ]
},
{
"cell_type": "code",
- "source": [
- "import pandas as pd\n",
- "\n",
- "import graphistry\n",
- "\n",
- "from graphistry import (\n",
- "\n",
- " # graph operators\n",
- " n, e_undirected, e_forward, e_reverse,\n",
- "\n",
- " # attribute predicates\n",
- " is_in, ge, startswith, contains, match as match_re\n",
- ")\n",
- "graphistry.__version__"
- ],
+ "execution_count": 4,
"metadata": {
- "id": "S5_y0CbLkjft",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
- "outputId": "a68a9c4b-c9c5-4b8b-ea4f-7bf1e4ddf315"
+ "id": "S5_y0CbLkjft",
+ "outputId": "c8afe192-51c8-45d2-a79e-c1902200e6a3"
},
- "execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
- "'0.32.0+12.g72e778c'"
+ "'0.33.9'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
- "execution_count": 3
+ "execution_count": 4
}
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "import graphistry, time\n",
+ "\n",
+ "from graphistry import (\n",
+ "\n",
+ " # graph operators\n",
+ " n, e_undirected, e_forward, e_reverse,\n",
+ "\n",
+ " # attribute predicates\n",
+ " is_in, ge, startswith, contains, match as match_re\n",
+ ")\n",
+ "graphistry.__version__"
]
},
{
"cell_type": "code",
- "source": [
- "import cudf"
- ],
+ "execution_count": 5,
"metadata": {
- "id": "I7Fg75jsG4co"
+ "id": "uLZKph2-a5M4"
},
- "execution_count": 6,
- "outputs": []
- },
- {
- "cell_type": "code",
+ "outputs": [],
"source": [
"#work around google colab shell encoding bugs\n",
"\n",
"import locale\n",
"locale.getpreferredencoding = lambda: \"UTF-8\""
- ],
- "metadata": {
- "id": "uLZKph2-a5M4"
- },
- "execution_count": 7,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "# 2. Perf benchmarks"
- ],
"metadata": {
"id": "eU9SyauNUHtR"
- }
+ },
+ "source": [
+ "# 2. Perf benchmarks"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "### Facebook: 88K edges"
- ],
"metadata": {
"id": "NA0Ym11fkB8j"
- }
+ },
+ "source": [
+ "\n",
+ "### Facebook: 88K edges"
+ ]
},
{
"cell_type": "code",
- "source": [
- "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n",
- "print(df.shape)\n",
- "df.head(5)"
- ],
+ "execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 224
},
"id": "vXuQogHekClJ",
- "outputId": "64db92c0-2704-438b-d0e4-25865acbb5e9"
+ "outputId": "e984cfbd-ad39-4902-918d-598f342e6f06"
},
- "execution_count": 10,
"outputs": [
{
"output_type": "stream",
@@ -348,7 +306,7 @@
],
"text/html": [
"\n",
- "
\n",
+ "
\n",
"
\n",
"\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " s | \n",
+ " d | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 116374117927631468606 | \n",
+ " 101765416973555767821 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 112188647432305746617 | \n",
+ " 107727150903234299458 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 116719211656774388392 | \n",
+ " 100432456209427807893 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 117421021456205115327 | \n",
+ " 101096322838605097368 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 116407635616074189669 | \n",
+ " 113556266482860931616 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"107727150903234299458\",\n \"113556266482860931616\",\n \"100432456209427807893\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 25
}
- ]
- },
- {
- "cell_type": "code",
- "source": [
- "%%time\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=4)])\n",
- "! nvidia-smi\n",
- "print(out._nodes.shape, out._edges.shape)\n",
- "del co_gdf\n",
- "del out"
],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "buutj-ZjhrEe",
- "outputId": "ae11addd-6bea-44e9-81c0-b431e1db8089"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Mon Dec 25 06:26:04 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 61C P0 29W / 70W | 1927MiB / 15360MiB | 36% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Mon Dec 25 06:26:13 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 65C P0 71W / 70W | 2931MiB / 15360MiB | 90% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(718640, 1) (2210961, 2)\n",
- "CPU times: user 9.01 s, sys: 1.03 s, total: 10 s\n",
- "Wall time: 9.84 s\n"
- ]
- }
- ]
- },
- {
- "cell_type": "code",
"source": [
"%%time\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=5)])\n",
- "! nvidia-smi\n",
- "print(out._nodes.shape, out._edges.shape)\n",
- "del co_gdf\n",
- "del out"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "bK4C9Ly0hso-",
- "outputId": "8a9a32ab-03e2-42b4-8b71-2bcf797b31b1"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Mon Dec 25 06:27:18 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 60C P0 29W / 70W | 1927MiB / 15360MiB | 28% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Mon Dec 25 06:27:57 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 72C P0 43W / 70W | 4351MiB / 15360MiB | 100% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(3041556, 1) (47622917, 2)\n",
- "CPU times: user 34.9 s, sys: 4.76 s, total: 39.6 s\n",
- "Wall time: 39.2 s\n"
- ]
- }
+ "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n",
+ "print(ge_df.shape)\n",
+ "ge_df.head(5)"
]
},
{
"cell_type": "code",
- "source": [
- "%%time\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "out = co_gdf.chain([ n({'id': 1}), e_forward(hops=6)])._nodes\n",
- "print(out.shape)\n",
- "del co_gdf\n",
- "del out"
- ],
- "metadata": {
- "id": "qrga-la0hwhh"
- },
- "execution_count": null,
- "outputs": []
- },
- {
- "cell_type": "code",
- "source": [
- "!lscpu\n"
- ],
+ "execution_count": 26,
"metadata": {
+ "id": "w5YkN-nLK6UV",
"colab": {
- "base_uri": "https://localhost:8080/"
+ "base_uri": "https://localhost:8080/",
+ "height": 260
},
- "id": "eiXFImxF-rzw",
- "outputId": "b807cc3d-ed1a-4bef-c6e0-bfc2df7356ff"
+ "outputId": "89c4b0a5-a355-4558-8b3c-187b0efe471a"
},
- "execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "Architecture: x86_64\n",
- " CPU op-mode(s): 32-bit, 64-bit\n",
- " Address sizes: 46 bits physical, 48 bits virtual\n",
- " Byte Order: Little Endian\n",
- "CPU(s): 2\n",
- " On-line CPU(s) list: 0,1\n",
- "Vendor ID: GenuineIntel\n",
- " Model name: Intel(R) Xeon(R) CPU @ 2.20GHz\n",
- " CPU family: 6\n",
- " Model: 79\n",
- " Thread(s) per core: 2\n",
- " Core(s) per socket: 1\n",
- " Socket(s): 1\n",
- " Stepping: 0\n",
- " BogoMIPS: 4399.99\n",
- " Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf\n",
- " lush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_\n",
- " good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fm\n",
- " a cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp\n",
- " ervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsb\n",
- " ase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsa\n",
- " veopt arat md_clear arch_capabilities\n",
- "Virtualization features: \n",
- " Hypervisor vendor: KVM\n",
- " Virtualization type: full\n",
- "Caches (sum of all): \n",
- " L1d: 32 KiB (1 instance)\n",
- " L1i: 32 KiB (1 instance)\n",
- " L2: 256 KiB (1 instance)\n",
- " L3: 55 MiB (1 instance)\n",
- "NUMA: \n",
- " NUMA node(s): 1\n",
- " NUMA node0 CPU(s): 0,1\n",
- "Vulnerabilities: \n",
- " Gather data sampling: Not affected\n",
- " Itlb multihit: Not affected\n",
- " L1tf: Mitigation; PTE Inversion\n",
- " Mds: Vulnerable; SMT Host state unknown\n",
- " Meltdown: Vulnerable\n",
- " Mmio stale data: Vulnerable\n",
- " Retbleed: Vulnerable\n",
- " Spec rstack overflow: Not affected\n",
- " Spec store bypass: Vulnerable\n",
- " Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swap\n",
- " gs barriers\n",
- " Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected\n",
- " Srbds: Not affected\n",
- " Tsx async abort: Vulnerable\n"
+ "(30494866, 2) (107614, 1)\n",
+ "CPU times: user 5.14 s, sys: 1.08 s, total: 6.22 s\n",
+ "Wall time: 6.27 s\n"
]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " id\n",
+ "0 116374117927631468606\n",
+ "1 112188647432305746617\n",
+ "2 116719211656774388392\n",
+ "3 117421021456205115327\n",
+ "4 116407635616074189669"
+ ],
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " id | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 116374117927631468606 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 112188647432305746617 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 116719211656774388392 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 117421021456205115327 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 116407635616074189669 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 26
}
- ]
- },
- {
- "cell_type": "code",
- "source": [
- "!free -h\n"
],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "wJohLi58-sN5",
- "outputId": "c3e144f6-c19a-4c68-e867-f5e7fa2e9df4"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- " total used free shared buff/cache available\n",
- "Mem: 12Gi 717Mi 8.0Gi 1.0Mi 3.9Gi 11Gi\n",
- "Swap: 0B 0B 0B\n"
- ]
- }
- ]
- },
- {
- "cell_type": "code",
"source": [
"%%time\n",
- "start_nodes = pd.DataFrame({'id': [1]})\n",
- "! nvidia-smi\n",
- "for i in range(1):\n",
- " g2 = co_g.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=1)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "#del start_nodes\n",
- "#del co_gdf\n",
- "#del g2"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "zak4Inhco5il",
- "outputId": "30bcf2bc-853e-4e5e-8c57-ba0cd9429554"
- },
- "execution_count": null,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 01:01:43 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 64C P0 30W / 70W | 2821MiB / 15360MiB | 0% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n"
- ]
- }
+ "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n",
+ "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n",
+ "print(gg._edges.shape, gg._nodes.shape)\n",
+ "gg._nodes.head(5)"
]
},
{
"cell_type": "code",
- "source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=1)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
- ],
+ "execution_count": 27,
"metadata": {
- "id": "-SmFlCBS_Bgx",
+ "id": "NKtz54uELX-8",
"colab": {
- "base_uri": "https://localhost:8080/"
+ "base_uri": "https://localhost:8080/",
+ "height": 116
},
- "outputId": "d2326cf7-3ea6-4f99-9548-f2e98ece59a4"
+ "outputId": "f4b28841-62bc-42cd-e771-127400a2689e"
},
- "execution_count": 16,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
- "Tue Dec 26 00:56:45 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 49C P0 28W / 70W | 1923MiB / 15360MiB | 37% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:56:47 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 52C P0 70W / 70W | 2819MiB / 15360MiB | 79% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(12, 1) (11, 2)\n",
- "CPU times: user 1.6 s, sys: 37.3 ms, total: 1.64 s\n",
- "Wall time: 1.84 s\n"
+ "CPU times: user 676 ms, sys: 400 ms, total: 1.08 s\n",
+ "Wall time: 1.11 s\n"
]
- }
- ]
- },
- {
- "cell_type": "code",
- "source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=2)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
},
- "id": "fjjt3YnYnabv",
- "outputId": "05762f50-bfe1-4d23-9153-31431418c8e5"
- },
- "execution_count": 17,
- "outputs": [
{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 00:56:47 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 51C P0 35W / 70W | 1923MiB / 15360MiB | 59% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:56:49 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 53C P0 59W / 70W | 2821MiB / 15360MiB | 86% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(391, 1) (461, 2)\n",
- "CPU times: user 2.32 s, sys: 58.5 ms, total: 2.38 s\n",
- "Wall time: 2.51 s\n"
- ]
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " id\n",
+ "0 116374117927631468606"
+ ],
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " id | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 116374117927631468606 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"116374117927631468606\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 27
}
+ ],
+ "source": [
+ "%%time\n",
+ "gg.chain([ n({'id': '116374117927631468606'})])._nodes"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases"
+ ],
+ "metadata": {
+ "id": "e4ZchWvrBKdY"
+ }
+ },
{
"cell_type": "code",
"source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=3)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
+ "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n",
+ "\n",
+ "\n",
+ "for n_hop in [1,2,3,4,5]:\n",
+ " start_nodes = pd.DataFrame({fg._node: [0]})\n",
+ " start0 = time.time()\n",
+ " out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes\n",
+ " end0 = time.time()\n",
+ " T0 = end0-start0\n",
+ " gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
+ " start1 = time.time()\n",
+ " out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])\n",
+ " end1 = time.time()\n",
+ "\n",
+ " del gg_gdf\n",
+ " del out\n",
+ " T1 = end1-start1\n",
+ " # print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))\n",
+ "\n",
+ " new_row = pd.DataFrame({\n",
+ " 'hops': [n_hop],\n",
+ " 'CPU hop chain time (s)': [np.round(T0, 4)],\n",
+ " 'GPU hop chain time (s)': [np.round(T1, 4)],\n",
+ " 'n_notation speedup': [np.round(T0 / T1, 4)]\n",
+ " })\n",
+ "\n",
+ " results_df = pd.concat([results_df, new_row], ignore_index=True)\n",
+ "\n",
+ "(results_df.T)"
],
"metadata": {
"colab": {
- "base_uri": "https://localhost:8080/"
+ "base_uri": "https://localhost:8080/",
+ "height": 175
},
- "id": "oIouuORgnbcY",
- "outputId": "f07abe4c-5137-4ee3-935a-afbb2c5eaa1e"
+ "id": "fTnU8MLr8tV5",
+ "outputId": "203eb5bf-9d95-4557-f35e-7ef2274424c5"
},
- "execution_count": 18,
+ "execution_count": 28,
"outputs": [
{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 00:56:50 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 52C P0 36W / 70W | 1925MiB / 15360MiB | 55% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:56:53 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 54C P0 75W / 70W | 2825MiB / 15360MiB | 74% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(21767, 1) (28480, 2)\n",
- "CPU times: user 3.04 s, sys: 63.6 ms, total: 3.1 s\n",
- "Wall time: 3.25 s\n"
- ]
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " 0 1 2 3 4\n",
+ "hops 1 2 3 4 5\n",
+ "CPU hop chain time (s) 33.7597 50.877 228.473 291.1332 327.8891\n",
+ "GPU hop chain time (s) 0.3082 0.6515 2.9645 4.1146 4.7598\n",
+ "n_notation speedup 109.5356 78.0912 77.0694 70.7561 68.8877"
+ ],
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | hops | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ "
\n",
+ " \n",
+ " | CPU hop chain time (s) | \n",
+ " 33.7597 | \n",
+ " 50.877 | \n",
+ " 228.473 | \n",
+ " 291.1332 | \n",
+ " 327.8891 | \n",
+ "
\n",
+ " \n",
+ " | GPU hop chain time (s) | \n",
+ " 0.3082 | \n",
+ " 0.6515 | \n",
+ " 2.9645 | \n",
+ " 4.1146 | \n",
+ " 4.7598 | \n",
+ "
\n",
+ " \n",
+ " | n_notation speedup | \n",
+ " 109.5356 | \n",
+ " 78.0912 | \n",
+ " 77.0694 | \n",
+ " 70.7561 | \n",
+ " 68.8877 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.3082,\n \"max\": 109.5356,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.7597,\n 109.5356,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.6515,\n \"max\": 78.0912,\n \"num_unique_values\": 4,\n \"samples\": [\n 50.877,\n 78.0912,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 2.9645,\n \"max\": 228.473,\n \"num_unique_values\": 4,\n \"samples\": [\n 228.473,\n 77.0694,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4,\n \"max\": 291.1332,\n \"num_unique_values\": 4,\n \"samples\": [\n 291.1332,\n 70.7561,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4.7598,\n \"max\": 327.8891,\n \"num_unique_values\": 4,\n \"samples\": [\n 327.8891,\n 68.8877,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 28
}
]
},
{
- "cell_type": "code",
+ "cell_type": "markdown",
"source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=4)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
+ "and similarly for these hop operations -- **100x** speed increases"
],
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "oNLZGjwInc85",
- "outputId": "534097cf-4022-48cc-9419-a00c135f69e1"
- },
- "execution_count": 19,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 00:56:53 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 54C P0 36W / 70W | 1927MiB / 15360MiB | 54% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:56:58 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 56C P0 38W / 70W | 2907MiB / 15360MiB | 89% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(718640, 1) (2210961, 2)\n",
- "CPU times: user 4.58 s, sys: 309 ms, total: 4.89 s\n",
- "Wall time: 5.02 s\n"
- ]
- }
- ]
+ "id": "80bs6Y5pBWb2"
+ }
},
{
"cell_type": "code",
"source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=5)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
+ "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n",
+ "\n",
+ "\n",
+ "for n_hop in [1,2,3,4,5]:\n",
+ " start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n",
+ " start0 = time.time()\n",
+ " for i in range(1):\n",
+ " g2 = gg.hop(\n",
+ " nodes=start_nodes,\n",
+ " direction='forward',\n",
+ " hops=n_hop)\n",
+ " end0 = time.time()\n",
+ " T0 = end0-start0\n",
+ " start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n",
+ " gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n",
+ " start1 = time.time()\n",
+ " for i in range(1):\n",
+ " g2 = gg_gdf.hop(\n",
+ " nodes=start_nodes,\n",
+ " direction='forward',\n",
+ " hops=n_hop)\n",
+ " end1 = time.time()\n",
+ "\n",
+ " del start_nodes\n",
+ " del gg_gdf\n",
+ " del g2\n",
+ " T1 = end1-start1\n",
+ "\n",
+ " new_row = pd.DataFrame({\n",
+ " 'hops': [n_hop],\n",
+ " 'CPU hop chain time (s)': [np.round(T0, 4)],\n",
+ " 'GPU hop chain time (s)': [np.round(T1, 4)],\n",
+ " 'n_notation speedup': [np.round(T0 / T1, 4)]\n",
+ " })\n",
+ "\n",
+ " results_df = pd.concat([results_df, new_row], ignore_index=True)\n",
+ "\n",
+ "(results_df.T)"
],
"metadata": {
"colab": {
- "base_uri": "https://localhost:8080/"
+ "base_uri": "https://localhost:8080/",
+ "height": 175
},
- "id": "ePqaeujMneX8",
- "outputId": "ffd88fff-016e-4ac0-ecb9-fa06baca60f8"
+ "id": "N2-gDFod9vc3",
+ "outputId": "907da762-fae2-4caa-cdd2-e78e13b2f635"
},
- "execution_count": 20,
+ "execution_count": 29,
"outputs": [
{
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 00:56:58 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 55C P0 37W / 70W | 1925MiB / 15360MiB | 59% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:57:10 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 60C P0 48W / 70W | 4325MiB / 15360MiB | 99% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(3041556, 1) (47622917, 2)\n",
- "CPU times: user 10.8 s, sys: 1.29 s, total: 12.1 s\n",
- "Wall time: 12 s\n"
- ]
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " 0 1 2 3 4\n",
+ "hops 1 2 3 4 5\n",
+ "CPU hop chain time (s) 19.6594 33.2538 64.8384 98.9693 147.4526\n",
+ "GPU hop chain time (s) 0.116 0.2583 0.8252 1.3544 1.9375\n",
+ "n_notation speedup 169.4189 128.7532 78.5772 73.071 76.103"
+ ],
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | hops | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ "
\n",
+ " \n",
+ " | CPU hop chain time (s) | \n",
+ " 19.6594 | \n",
+ " 33.2538 | \n",
+ " 64.8384 | \n",
+ " 98.9693 | \n",
+ " 147.4526 | \n",
+ "
\n",
+ " \n",
+ " | GPU hop chain time (s) | \n",
+ " 0.116 | \n",
+ " 0.2583 | \n",
+ " 0.8252 | \n",
+ " 1.3544 | \n",
+ " 1.9375 | \n",
+ "
\n",
+ " \n",
+ " | n_notation speedup | \n",
+ " 169.4189 | \n",
+ " 128.7532 | \n",
+ " 78.5772 | \n",
+ " 73.071 | \n",
+ " 76.103 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.116,\n \"max\": 169.4189,\n \"num_unique_values\": 4,\n \"samples\": [\n 19.6594,\n 169.4189,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.2583,\n \"max\": 128.7532,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.2538,\n 128.7532,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.8252,\n \"max\": 78.5772,\n \"num_unique_values\": 4,\n \"samples\": [\n 64.8384,\n 78.5772,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.3544,\n \"max\": 98.9693,\n \"num_unique_values\": 4,\n \"samples\": [\n 98.9693,\n 73.071,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.9375,\n \"max\": 147.4526,\n \"num_unique_values\": 4,\n \"samples\": [\n 147.4526,\n 76.103,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 29
}
]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "gpuType": "T4",
+ "provenance": []
},
- {
- "cell_type": "code",
- "source": [
- "%%time\n",
- "start_nodes = cudf.DataFrame({'id': [1]})\n",
- "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n",
- "! nvidia-smi\n",
- "for i in range(10):\n",
- " g2 = co_gdf.hop(\n",
- " nodes=start_nodes,\n",
- " direction='forward',\n",
- " hops=6)\n",
- "! nvidia-smi\n",
- "print(g2._nodes.shape, g2._edges.shape)\n",
- "del start_nodes\n",
- "del co_gdf\n",
- "del g2"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "PTBkoIVHnfzK",
- "outputId": "5615ecd7-47ea-46ab-fd36-13bce4b3c787"
- },
- "execution_count": 21,
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": [
- "Tue Dec 26 00:57:10 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 59C P0 38W / 70W | 1925MiB / 15360MiB | 44% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "Tue Dec 26 00:57:38 2023 \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
- "|-----------------------------------------+----------------------+----------------------+\n",
- "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
- "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
- "| | | MIG M. |\n",
- "|=========================================+======================+======================|\n",
- "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
- "| N/A 68C P0 55W / 70W | 6445MiB / 15360MiB | 95% Default |\n",
- "| | | N/A |\n",
- "+-----------------------------------------+----------------------+----------------------+\n",
- " \n",
- "+---------------------------------------------------------------------------------------+\n",
- "| Processes: |\n",
- "| GPU GI CI PID Type Process name GPU Memory |\n",
- "| ID ID Usage |\n",
- "|=======================================================================================|\n",
- "+---------------------------------------------------------------------------------------+\n",
- "(3071927, 1) (117032738, 2)\n",
- "CPU times: user 23.5 s, sys: 2.68 s, total: 26.2 s\n",
- "Wall time: 28.2 s\n"
- ]
- }
- ]
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
},
- {
- "cell_type": "code",
- "source": [],
- "metadata": {
- "id": "Ygc2nrkznlCu"
- },
- "execution_count": null,
- "outputs": []
+ "language_info": {
+ "name": "python"
}
- ]
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
}
\ No newline at end of file