diff --git a/README.md b/README.md index a50e260f76..afc7b6c832 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](https://github.com/graphistry/pygraphistry/blob/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](https://github.com/graphistry/pygraphistry/blob/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1250,7 +1250,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](https://github.com/graphistry/pygraphistry/tree/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](https://github.com/graphistry/pygraphistry/tree/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) Traverse within a graph, or expand one graph against another diff --git a/demos/gfql/benchmark_hops_cpu_gpu.ipynb b/demos/gfql/benchmark_hops_cpu_gpu.ipynb index bf17b630e7..869344af4c 100644 --- a/demos/gfql/benchmark_hops_cpu_gpu.ipynb +++ b/demos/gfql/benchmark_hops_cpu_gpu.ipynb @@ -1,27 +1,14 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "gpuType": "T4" - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - }, - "accelerator": "GPU" - }, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "GZxoiU8sQDk_" + }, "source": [ "# GFQL CPU, GPU Benchmark\n", "\n", - "This notebook examines GFQL progerty graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", + "This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", "\n", "The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n", "\n", @@ -30,10 +17,10 @@ "\n", "| Network | Nodes | Edges |\n", "|-------------|-----------|--------------|\n", - "| **Facebook**| 4,039 | 88,234 |\n", - "| **Twitter** | 81,306 | 2,420,766 |\n", - "| **GPlus** | 107,614 | 30,494,866 |\n", - "| **Orkut** | 3,072,441 | 117,185,082 |\n", + "| [**Facebook**](#fb)| 4,039 | 88,234 |\n", + "| [**Twitter**](#tw) | 81,306 | 2,420,766 |\n", + "| [**GPlus**](#gpl) | 107,614 | 30,494,866 |\n", + "| [**Orkut**](#ork) | 3,072,441 | 117,185,082 |\n", "\n", "## Results\n", "\n", @@ -52,10 +39,10 @@ "\n", "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", "|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n", - "| **Facebook**| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", - "| **Twitter** | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", - "| **GPlus** | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", - "| **Orkut** | N/A | N/A | 12.15 | N/A | 208.3 |\n", + "| [**Facebook**](#fb)| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", + "| [**Twitter**](#tw) | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", + "| [**GPlus**](#gpl) | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 12.15 | N/A | 208.3 |\n", "| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n", "| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n", "\n", @@ -67,53 +54,46 @@ "\n", "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", "|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n", - "| **Facebook**| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", - "| **Twitter** | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", - "| **GPlus** | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", - "| **Orkut** | N/A | N/A | 41.50 | N/A | 711.4 |\n", + "| [**Facebook**](#fb)| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", + "| [**Twitter**](#tw) | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", + "| [**GPlus**](#gpl) | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 41.50 | N/A | 711.4 |\n", "| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n", "| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n" - ], - "metadata": { - "id": "GZxoiU8sQDk_" - } + ] }, { "cell_type": "markdown", - "source": [ - "## Optional: GPU setup - Google Colab" - ], "metadata": { "id": "SAj8lhREEOwS" - } + }, + "source": [ + "## Optional: GPU setup - Google Colab" + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "id": "4hrEEAAm7DTO" - } + }, + "source": [] }, { "cell_type": "code", - "source": [ - "# Report GPU used when GPU benchmarking\n", - "! nvidia-smi" - ], + "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "W2MF6ZsjDv3B", - "outputId": "46088cbc-2db9-4529-f724-dc57ed85dfb7" + "outputId": "ad2ab798-617d-49db-e379-5670debe4951" }, - "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Tue Dec 26 00:50:30 2023 \n", + "Tue Jul 9 13:29:05 2024 \n", "+---------------------------------------------------------------------------------------+\n", "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", "|-----------------------------------------+----------------------+----------------------+\n", @@ -122,7 +102,7 @@ "| | | MIG M. |\n", "|=========================================+======================+======================|\n", "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n", + "| N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n", "| | | N/A |\n", "+-----------------------------------------+----------------------+----------------------+\n", " \n", @@ -135,198 +115,176 @@ "+---------------------------------------------------------------------------------------+\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "# if in google colab\n", - "!git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n", - "!python rapidsai-csp-utils/colab/pip-install.py" ], - "metadata": { - "id": "Aikh0x4ID_wK" - }, - "execution_count": 8, - "outputs": [] + "source": [ + "# Report GPU used when GPU benchmarking\n", + "! nvidia-smi" + ] }, { "cell_type": "code", - "source": [ - "import cudf\n", - "cudf.__version__" - ], + "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "Lwekdei1dH3N", - "outputId": "71f5b01d-7917-4283-8338-969167d6e1e8" + "outputId": "51562461-432e-4b8d-f697-0a6b559ac8b0" }, - "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "'23.12.01'" + "'24.04.01'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, - "execution_count": 3 + "execution_count": 2 } + ], + "source": [ + "import cudf\n", + "cudf.__version__" ] }, { "cell_type": "markdown", - "source": [ - "# 1. Install & configure" - ], "metadata": { "id": "QQpsrtwBT7sa" - } + }, + "source": [ + "# 1. Install & configure" + ] }, { "cell_type": "code", - "source": [ - "#! pip install graphistry[igraph]\n", - "\n", - "!pip install -q igraph\n", - "#!pip install -q git+https://github.com/graphistry/pygraphistry.git@dev/cugfql\n", - "!pip install -q graphistry\n" - ], + "execution_count": 3, "metadata": { - "id": "cYjRbgkU9Sx8", "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "2cf25531-9b8b-4715-ccc7-e79094d84ebd" + "id": "cYjRbgkU9Sx8", + "outputId": "c8e454a2-e537-467e-afc6-830c51ad869c" }, - "execution_count": 2, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n" + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m250.5/250.5 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m332.3/332.3 kB\u001b[0m \u001b[31m9.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h" ] } + ], + "source": [ + "#! pip install graphistry[igraph]\n", + "\n", + "!pip install -q igraph\n", + "!pip install -q graphistry\n" ] }, { "cell_type": "markdown", - "source": [ - "## Imports" - ], "metadata": { "id": "Ff6Tt9DhkePl" - } + }, + "source": [ + "## Imports" + ] }, { "cell_type": "code", - "source": [ - "import pandas as pd\n", - "\n", - "import graphistry\n", - "\n", - "from graphistry import (\n", - "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", - "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", - "graphistry.__version__" - ], + "execution_count": 4, "metadata": { - "id": "S5_y0CbLkjft", "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, - "outputId": "a68a9c4b-c9c5-4b8b-ea4f-7bf1e4ddf315" + "id": "S5_y0CbLkjft", + "outputId": "c8afe192-51c8-45d2-a79e-c1902200e6a3" }, - "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "'0.32.0+12.g72e778c'" + "'0.33.9'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, - "execution_count": 3 + "execution_count": 4 } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import graphistry, time\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "graphistry.__version__" ] }, { "cell_type": "code", - "source": [ - "import cudf" - ], + "execution_count": 5, "metadata": { - "id": "I7Fg75jsG4co" + "id": "uLZKph2-a5M4" }, - "execution_count": 6, - "outputs": [] - }, - { - "cell_type": "code", + "outputs": [], "source": [ "#work around google colab shell encoding bugs\n", "\n", "import locale\n", "locale.getpreferredencoding = lambda: \"UTF-8\"" - ], - "metadata": { - "id": "uLZKph2-a5M4" - }, - "execution_count": 7, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "# 2. Perf benchmarks" - ], "metadata": { "id": "eU9SyauNUHtR" - } + }, + "source": [ + "# 2. Perf benchmarks" + ] }, { "cell_type": "markdown", - "source": [ - "### Facebook: 88K edges" - ], "metadata": { "id": "NA0Ym11fkB8j" - } + }, + "source": [ + "\n", + "### Facebook: 88K edges" + ] }, { "cell_type": "code", - "source": [ - "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n", - "print(df.shape)\n", - "df.head(5)" - ], + "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 224 }, "id": "vXuQogHekClJ", - "outputId": "64db92c0-2704-438b-d0e4-25865acbb5e9" + "outputId": "e984cfbd-ad39-4902-918d-598f342e6f06" }, - "execution_count": 10, "outputs": [ { "output_type": "stream", @@ -348,7 +306,7 @@ ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ] - }, - "metadata": {}, - "execution_count": 6 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", - "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", - "print(gg._edges.shape, gg._nodes.shape)\n", - "gg._nodes.head(5)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 258 - }, - "id": "w5YkN-nLK6UV", - "outputId": "dc98380d-54c2-4b36-c56e-5e8401c4ffa4" - }, - "execution_count": 7, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(30494866, 2) (107614, 1)\n", - "CPU times: user 4.49 s, sys: 1.25 s, total: 5.74 s\n", - "Wall time: 5.97 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606\n", - "1 112188647432305746617\n", - "2 116719211656774388392\n", - "3 117421021456205115327\n", - "4 116407635616074189669" + " 0 1\n", + "hops 2 5\n", + "CPU n_notation time (s) 11.8076 25.4098\n", + "GPU n_notation time (s) 10.3238 14.4829\n", + "n_notation speedup 1.1437 1.7545\n", + "CPU source_node_match time (s) 12.0969 10.2662\n", + "GPU source_node_match time (s) 11.2681 11.199\n", + "source_node_match speedup 1.0736 0.9167" ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ] - }, - "metadata": {}, - "execution_count": 7 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg.chain([ n({'id': '116374117927631468606'})])._nodes" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 115 - }, - "id": "NKtz54uELX-8", - "outputId": "5d8f3eef-893d-47cc-e7a9-c5cbfec8270c" - }, - "execution_count": 49, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 534 ms, sys: 598 ms, total: 1.13 s\n", - "Wall time: 1.65 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
0116374117927631468606
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", + " animation:\n", + " spin 1s steps(1) infinite;\n", + " }\n", "\n", - " \n", - "\n", - " \n", - "
\n", + " quickchartButtonEl.classList.remove('colab-df-spinner');\n", + " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", + " }\n", + " (() => {\n", + " let quickchartButtonEl =\n", + " document.querySelector('#df-0a712080-5f34-4df9-a79a-4e3af31230b0 button');\n", + " quickchartButtonEl.style.display =\n", + " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", + " })();\n", + " \n", + "
\n", "\n", "
\n", "
\n" - ] - }, - "metadata": {}, - "execution_count": 49 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])._nodes\n", - "out.shape" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "iNWdi00VLmZG", - "outputId": "ecfb56a6-c564-4bf6-f43f-2c95a103f4be" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 27.5 s, sys: 11.1 s, total: 38.5 s\n", - "Wall time: 39.5 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(1473, 1)" - ] - }, - "metadata": {}, - "execution_count": 75 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Q6p3h6uCOABh", - "outputId": "817fc80f-ef5d-4070-eb48-a12344be709c" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 4.57 s, sys: 2.11 s, total: 6.68 s\n", - "Wall time: 7.63 s\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])._nodes\n", - "out.shape" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "6UdCcMdqLw-P", - "outputId": "70742c79-b22b-4db2-c548-cb1e25d572eb" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 45.8 s, sys: 17 s, total: 1min 2s\n", - "Wall time: 1min 5s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(44073, 1)" - ] + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 7,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.0736,\n \"max\": 12.0969,\n \"num_unique_values\": 7,\n \"samples\": [\n 2,\n 11.8076,\n 11.2681\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.9167,\n \"max\": 25.4098,\n \"num_unique_values\": 7,\n \"samples\": [\n 5,\n 25.4098,\n 11.199\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } }, "metadata": {}, - "execution_count": 77 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "QElqatDyNYCS", - "outputId": "0e15bd3e-d2d9-4965-df7d-c8856d036680" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 4.97 s, sys: 2.36 s, total: 7.34 s\n", - "Wall time: 10.6 s\n" - ] + "execution_count": 10 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])._nodes\n", - "out.shape" + "and with simple 2 and 5 hop `hop` comparison we see a 2x speedup enabled by setting g. to `cudf`" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3HJOItZ4MQMG", - "outputId": "f5be7bb4-7f09-4f80-c549-e703e99f5067" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 3min 45s, sys: 1min 5s, total: 4min 50s\n", - "Wall time: 4min 52s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(102414, 1)" - ] - }, - "metadata": {}, - "execution_count": 79 - } - ] + "id": "5-7M9sPEAf5Z" + } }, { "cell_type": "code", "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop time (s)', 'GPU hop time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "\n", + "for n_hop in [2,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({fg._node: [0]})\n", + " fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", + " start1 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + "\n", + " del fg_gdf\n", + " del fg2\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop time (s)': [np.round(T0, 4)],\n", + " 'GPU hop time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "# print(results_df)" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "G32t_xthOUle", - "outputId": "7721741f-9c86-41aa-eb0b-2c8f0db2ed54" + "id": "Tki_0-_j3XKG" }, "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 6.95 s, sys: 2.63 s, total: 9.57 s\n", - "Wall time: 9.84 s\n" - ] - } - ] + "outputs": [] }, { "cell_type": "code", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", - "print(out._nodes.shape, out._edges.shape)" + "results_df.T" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "bXy2yyJsMsEG", - "outputId": "911f2680-067c-44f2-9ba2-7f27d3c9bc6b" + "id": "J_shIUugtU4D", + "outputId": "0877c04b-e1fc-4cdc-c928-54058ae184c8" }, - "execution_count": 8, + "execution_count": 13, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 4min 36s, sys: 1min 25s, total: 6min 2s\n", - "Wall time: 6min 4s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1\n", + "hops 2 5\n", + "CPU hop time (s) 5.8614 10.1756\n", + "GPU hop time (s) 2.3729 5.4458\n", + "n_notation speedup 2.4701 1.8685" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
hops25
CPU hop time (s)5.861410.1756
GPU hop time (s)2.37295.4458
n_notation speedup2.47011.8685
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 2,\n \"max\": 5.8614,\n \"num_unique_values\": 4,\n \"samples\": [\n 5.8614,\n 2.4701,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.8685,\n \"max\": 10.1756,\n \"num_unique_values\": 4,\n \"samples\": [\n 10.1756,\n 1.8685,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 13 } ] }, { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], + "cell_type": "markdown", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Vt8hhjWDP_W_", - "outputId": "824ae644-e1cf-4239-bda9-84aecde52ad8" + "id": "KrJKjXy2KLos" }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 7.44 s, sys: 2.45 s, total: 9.88 s\n", - "Wall time: 9.9 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", - "print(out._nodes.shape, out._edges.shape)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "_z4KpNZaOH8t", - "outputId": "2417f78b-e1b7-452d-8e26-7df259620c88" - }, - "execution_count": 9, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 5min 36s, sys: 1min 39s, total: 7min 16s\n", - "Wall time: 7min 15s\n" - ] - } + "\n", + "## Twitter\n", + "\n", + "- edges: 2420766\n", + "- nodes: 81306" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], + "execution_count": 15, "metadata": { - "id": "spUBH9EHSz2O", "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "22340ce3-e8d4-4a72-b485-9839c667b965" + "id": "fO2qasGqpubr", + "outputId": "63c76f29-28ef-4e6d-ff83-13365d680632" }, - "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 8.82 s, sys: 2.71 s, total: 11.5 s\n", - "Wall time: 11.9 s\n" + "--2024-07-09 13:36:53-- https://snap.stanford.edu/data/twitter_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10621918 (10M) [application/x-gzip]\n", + "Saving to: ‘twitter_combined.txt.gz’\n", + "\n", + "twitter_combined.tx 100%[===================>] 10.13M 19.6MB/s in 0.5s \n", + "\n", + "2024-07-09 13:36:54 (19.6 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]\n", + "\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "print(g2._nodes.shape, g2._edges.shape)" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "vCsdmc62A7OM", - "outputId": "adc05d29-c628-49ed-cd6d-8921c6dcd206" - }, - "execution_count": 50, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 19.9 s, sys: 9.36 s, total: 29.2 s\n", - "Wall time: 41.8 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "J3kV8NBYBQdW", - "outputId": "76073248-43e1-4c3c-c004-67324cc1d312" - }, - "execution_count": 52, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 3.71 s, sys: 2.09 s, total: 5.8 s\n", - "Wall time: 6.05 s\n" - ] - } + "! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'\n", + "#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 16, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ONv1RQeWBeeK", - "outputId": "58d57fa4-be72-45bc-abfa-5de9d1102f55" + "id": "fn7zeA3SGlEo" }, - "execution_count": 53, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 27.8 s, sys: 13.2 s, total: 41 s\n", - "Wall time: 43.9 s\n" - ] - } - ] - }, - { - "cell_type": "code", + "outputs": [], "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ke5SZZ01BgqR", - "outputId": "4173fd28-a11b-4300-d28b-6fdb87e8e9f3" - }, - "execution_count": 54, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 4.26 s, sys: 2.37 s, total: 6.63 s\n", - "Wall time: 7.91 s\n" - ] - } + "! gunzip twitter_combined.txt.gz" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "U795pIBUBiZV", - "outputId": "d499433c-cc0c-4bbf-c69f-36b5d55402d9" + "id": "68TAZkhLGz9g", + "outputId": "156f3da8-50c9-4e30-d1e5-9dba63e7f93d" }, - "execution_count": 55, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 1min 3s, sys: 22.7 s, total: 1min 26s\n", - "Wall time: 1min 35s\n" + "214328887 34428380\n", + "17116707 28465635\n", + "380580781 18996905\n", + "221036078 153460275\n", + "107830991 17868918\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "kIZYwSe1Bj2e", - "outputId": "b7e1ed9f-47d1-412e-9593-ecc436ac1486" - }, - "execution_count": 56, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 3.96 s, sys: 2.11 s, total: 6.07 s\n", - "Wall time: 6.05 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YTI5sD6YBpYL", - "outputId": "b37bf2df-07dc-404c-8a83-a83f28e38bf6" - }, - "execution_count": 57, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 1min 34s, sys: 30.6 s, total: 2min 5s\n", - "Wall time: 2min 5s\n" - ] - } + "! head -n 5 twitter_combined.txt" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], + "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "d5WBazICBrSz", - "outputId": "ef95e893-3a0f-4d47-ede4-bd8a6faebf98" + "id": "QU2wNeGXG2GC", + "outputId": "86439d2a-e85a-4e84-ee53-dfa5e0e89ce2" }, - "execution_count": 58, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 5.25 s, sys: 2.41 s, total: 7.67 s\n", - "Wall time: 7.69 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "(2420766, 2)" + ] + }, + "metadata": {}, + "execution_count": 18 } + ], + "source": [ + "te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])\n", + "te_df.shape" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 19, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ozQlRPaFBtPD", - "outputId": "4f1655c4-38fd-47f9-942d-836585e0d866" + "id": "EK5gQH2iG5UU" }, - "execution_count": 59, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 2min 16s, sys: 39.1 s, total: 2min 55s\n", - "Wall time: 2min 58s\n" - ] - } + "outputs": [], + "source": [ + "import graphistry" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], + "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "-ACkMG20B6HM", - "outputId": "f26c03a9-9f25-4f93-c7d3-0e8676694040" + "id": "ZtIW-eFGG_R4", + "outputId": "4082a078-3af9-4c4e-dc98-ddeee50a2489" }, - "execution_count": 60, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 5.79 s, sys: 2.51 s, total: 8.3 s\n", - "Wall time: 8.29 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "(81306, 1)" + ] + }, + "metadata": {}, + "execution_count": 20 } + ], + "source": [ + "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()\n", + "g._nodes.shape" ] }, { "cell_type": "markdown", "source": [ - "### Orkut\n", - "- 117M edges\n", - "- 3M nodes" + "on the twitter data, simpler `chain` operations over several different hops -- **10-20x** *italicized text* speed increases" ], "metadata": { - "id": "R03M_swxarKC" + "id": "yR9Qr8tGww3b" } }, { "cell_type": "code", "source": [ - "! wget https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.chain([n({'id': 17116707}), e_forward(hops=n_hop)])\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=n_hop)])._nodes\n", + " end1 = time.time()\n", + "\n", + " del g_gdf\n", + " del out\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "results_df.T" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "QoabYR2maxPo", - "outputId": "2bb6275d-46bb-42da-ec05-d0e5a58b1f77" - }, - "execution_count": 8, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2023-12-26 00:55:52-- https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz\n", - "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", - "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 447251958 (427M) [application/x-gzip]\n", - "Saving to: ‘com-orkut.ungraph.txt.gz’\n", - "\n", - "com-orkut.ungraph.t 100%[===================>] 426.53M 45.1MB/s in 9.7s \n", - "\n", - "2023-12-26 00:56:02 (44.0 MB/s) - ‘com-orkut.ungraph.txt.gz’ saved [447251958/447251958]\n", - "\n" - ] + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "id": "rCsvQJa-6U0x", + "outputId": "a8d52e0a-cd32-436d-a889-c997c6289055" + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2\n", + "hops 1 2 8\n", + "CPU hop chain time (s) 19.3802 17.21 84.5977\n", + "GPU hop chain time (s) 0.7395 1.5332 4.4011\n", + "n_notation speedup 26.2058 11.2246 19.2218" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012
hops128
CPU hop chain time (s)19.380217.2184.5977
GPU hop chain time (s)0.73951.53324.4011
n_notation speedup26.205811.224619.2218
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.7395,\n \"max\": 26.2058,\n \"num_unique_values\": 4,\n \"samples\": [\n 19.3802,\n 26.2058,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.5332,\n \"max\": 17.21,\n \"num_unique_values\": 4,\n \"samples\": [\n 17.21,\n 11.2246,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4.4011,\n \"max\": 84.5977,\n \"num_unique_values\": 4,\n \"samples\": [\n 84.5977,\n 19.2218,\n 8\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 21 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "! gunzip com-orkut.ungraph.txt.gz" + "and similarly for these `hop` operations -- **10-30x** speed increases" ], "metadata": { - "id": "BvvfFPKWbAVJ" - }, - "execution_count": 9, - "outputs": [] + "id": "gHHhyYlzArjw" + } }, { - "cell_type": "code", - "source": [ - "! head -n 7 com-orkut.ungraph.txt" - ], + "cell_type": "markdown", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YsWwRoPqbPIb", - "outputId": "2eb4f862-b4e1-42bf-ff5d-eec10b27cedc" + "id": "9dZzAAVONCD2" }, - "execution_count": 10, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "# Undirected graph: ../../data/output/orkut.txt\n", - "# Orkut\n", - "# Nodes: 3072441 Edges: 117185083\n", - "# FromNodeId\tToNodeId\n", - "1\t2\n", - "1\t3\n", - "1\t4\n" - ] - } + "source": [ + "\n", + "## GPlus\n", + "\n", + "- edges: 30494866\n", + "- nodes: 107614" ] }, { "cell_type": "code", "source": [ - "import pandas as pd\n", - "\n", - "import graphistry\n", + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", "\n", - "from graphistry import (\n", "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({g._node: [17116707]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + " g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + " end1 = time.time()\n", "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", + " del start_nodes\n", + " del g_gdf\n", + " del g2\n", + " T1 = end1-start1\n", "\n", - "import cudf\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", "\n", - "#work around google colab shell encoding bugs\n", - "import locale\n", - "locale.getpreferredencoding = lambda: \"UTF-8\"\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", "\n", - "cudf.__version__, graphistry.__version__" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cbMC8r2ldjbW", - "outputId": "82688d53-7d56-4563-d65e-7c5cd32ac14e" - }, - "execution_count": 11, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "('23.12.01', '0.32.0+12.g72e778c')" - ] - }, - "metadata": {}, - "execution_count": 11 - } - ] - }, - { - "cell_type": "code", - "source": [ - "! nvidia-smi" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "TopFxAvnh_Cv", - "outputId": "cc9d9dc9-e594-4190-fe84-3f1b6dce8a1a" - }, - "execution_count": 12, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:27 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 47C P0 27W / 70W | 103MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_df = cudf.read_csv('com-orkut.ungraph.txt', sep='\\t', names=['s', 'd'], skiprows=5).to_pandas()\n", - "print(co_df.shape)\n", - "print(co_df.head(5))\n", - "print(co_df.dtypes)\n", - "#del co_df" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Oczs87ITbJgw", - "outputId": "ac203ddd-e684-4eb9-a586-f6a49fd1625d" - }, - "execution_count": 13, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(117185082, 2)\n", - " s d\n", - "0 1 3\n", - "1 1 4\n", - "2 1 5\n", - "3 1 6\n", - "4 1 7\n", - "s int64\n", - "d int64\n", - "dtype: object\n", - "CPU times: user 2.56 s, sys: 4.2 s, total: 6.76 s\n", - "Wall time: 6.76 s\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_g = graphistry.edges(cudf.DataFrame(co_df), 's', 'd').materialize_nodes(engine='cudf')\n", - "co_g = co_g.nodes(lambda g: g._nodes.to_pandas()).edges(lambda g: g._edges.to_pandas())\n", - "print(co_g._nodes.shape, co_g._edges.shape)\n", - "co_g._nodes.head(5)" + "(results_df.T)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 258 + "height": 175 }, - "id": "gGSDjTtveFAT", - "outputId": "e7b38f4f-dc07-4f35-9bab-9c80a80bbf0b" + "id": "cnILbPnG7tf4", + "outputId": "3d2e0ca7-8b07-45ab-e020-222197639dc6" }, - "execution_count": 14, + "execution_count": 22, "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(3072441, 1) (117185082, 2)\n", - "CPU times: user 1.96 s, sys: 2.95 s, total: 4.91 s\n", - "Wall time: 4.92 s\n" - ] - }, { "output_type": "execute_result", "data": { "text/plain": [ - " id\n", - "0 1\n", - "1 2\n", - "2 3\n", - "3 4\n", - "4 5" + " 0 1 2\n", + "hops 1 2 8\n", + "CPU hop chain time (s) 18.8525 12.5991 43.39\n", + "GPU hop chain time (s) 1.0538 1.0413 1.4334\n", + "n_notation speedup 17.8901 12.0998 30.2698" ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"107727150903234299458\",\n \"113556266482860931616\",\n \"100432456209427807893\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 25 } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=4)])\n", - "! nvidia-smi\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del co_gdf\n", - "del out" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "buutj-ZjhrEe", - "outputId": "ae11addd-6bea-44e9-81c0-b431e1db8089" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Dec 25 06:26:04 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 61C P0 29W / 70W | 1927MiB / 15360MiB | 36% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Mon Dec 25 06:26:13 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 65C P0 71W / 70W | 2931MiB / 15360MiB | 90% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(718640, 1) (2210961, 2)\n", - "CPU times: user 9.01 s, sys: 1.03 s, total: 10 s\n", - "Wall time: 9.84 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=5)])\n", - "! nvidia-smi\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del co_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "bK4C9Ly0hso-", - "outputId": "8a9a32ab-03e2-42b4-8b71-2bcf797b31b1" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Dec 25 06:27:18 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 60C P0 29W / 70W | 1927MiB / 15360MiB | 28% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Mon Dec 25 06:27:57 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 72C P0 43W / 70W | 4351MiB / 15360MiB | 100% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3041556, 1) (47622917, 2)\n", - "CPU times: user 34.9 s, sys: 4.76 s, total: 39.6 s\n", - "Wall time: 39.2 s\n" - ] - } + "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(ge_df.shape)\n", + "ge_df.head(5)" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = co_gdf.chain([ n({'id': 1}), e_forward(hops=6)])._nodes\n", - "print(out.shape)\n", - "del co_gdf\n", - "del out" - ], - "metadata": { - "id": "qrga-la0hwhh" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "!lscpu\n" - ], + "execution_count": 26, "metadata": { + "id": "w5YkN-nLK6UV", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 260 }, - "id": "eiXFImxF-rzw", - "outputId": "b807cc3d-ed1a-4bef-c6e0-bfc2df7356ff" + "outputId": "89c4b0a5-a355-4558-8b3c-187b0efe471a" }, - "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Architecture: x86_64\n", - " CPU op-mode(s): 32-bit, 64-bit\n", - " Address sizes: 46 bits physical, 48 bits virtual\n", - " Byte Order: Little Endian\n", - "CPU(s): 2\n", - " On-line CPU(s) list: 0,1\n", - "Vendor ID: GenuineIntel\n", - " Model name: Intel(R) Xeon(R) CPU @ 2.20GHz\n", - " CPU family: 6\n", - " Model: 79\n", - " Thread(s) per core: 2\n", - " Core(s) per socket: 1\n", - " Socket(s): 1\n", - " Stepping: 0\n", - " BogoMIPS: 4399.99\n", - " Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf\n", - " lush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_\n", - " good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fm\n", - " a cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp\n", - " ervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsb\n", - " ase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsa\n", - " veopt arat md_clear arch_capabilities\n", - "Virtualization features: \n", - " Hypervisor vendor: KVM\n", - " Virtualization type: full\n", - "Caches (sum of all): \n", - " L1d: 32 KiB (1 instance)\n", - " L1i: 32 KiB (1 instance)\n", - " L2: 256 KiB (1 instance)\n", - " L3: 55 MiB (1 instance)\n", - "NUMA: \n", - " NUMA node(s): 1\n", - " NUMA node0 CPU(s): 0,1\n", - "Vulnerabilities: \n", - " Gather data sampling: Not affected\n", - " Itlb multihit: Not affected\n", - " L1tf: Mitigation; PTE Inversion\n", - " Mds: Vulnerable; SMT Host state unknown\n", - " Meltdown: Vulnerable\n", - " Mmio stale data: Vulnerable\n", - " Retbleed: Vulnerable\n", - " Spec rstack overflow: Not affected\n", - " Spec store bypass: Vulnerable\n", - " Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swap\n", - " gs barriers\n", - " Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected\n", - " Srbds: Not affected\n", - " Tsx async abort: Vulnerable\n" + "(30494866, 2) (107614, 1)\n", + "CPU times: user 5.14 s, sys: 1.08 s, total: 6.22 s\n", + "Wall time: 6.27 s\n" ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606\n", + "1 112188647432305746617\n", + "2 116719211656774388392\n", + "3 117421021456205115327\n", + "4 116407635616074189669" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
1112188647432305746617
2116719211656774388392
3117421021456205115327
4116407635616074189669
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 26 } - ] - }, - { - "cell_type": "code", - "source": [ - "!free -h\n" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "wJohLi58-sN5", - "outputId": "c3e144f6-c19a-4c68-e867-f5e7fa2e9df4" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - " total used free shared buff/cache available\n", - "Mem: 12Gi 717Mi 8.0Gi 1.0Mi 3.9Gi 11Gi\n", - "Swap: 0B 0B 0B\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ "%%time\n", - "start_nodes = pd.DataFrame({'id': [1]})\n", - "! nvidia-smi\n", - "for i in range(1):\n", - " g2 = co_g.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "#del start_nodes\n", - "#del co_gdf\n", - "#del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "zak4Inhco5il", - "outputId": "30bcf2bc-853e-4e5e-8c57-ba0cd9429554" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 01:01:43 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 64C P0 30W / 70W | 2821MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } + "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", + "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", + "print(gg._edges.shape, gg._nodes.shape)\n", + "gg._nodes.head(5)" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], + "execution_count": 27, "metadata": { - "id": "-SmFlCBS_Bgx", + "id": "NKtz54uELX-8", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 116 }, - "outputId": "d2326cf7-3ea6-4f99-9548-f2e98ece59a4" + "outputId": "f4b28841-62bc-42cd-e771-127400a2689e" }, - "execution_count": 16, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Tue Dec 26 00:56:45 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 49C P0 28W / 70W | 1923MiB / 15360MiB | 37% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:47 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 52C P0 70W / 70W | 2819MiB / 15360MiB | 79% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(12, 1) (11, 2)\n", - "CPU times: user 1.6 s, sys: 37.3 ms, total: 1.64 s\n", - "Wall time: 1.84 s\n" + "CPU times: user 676 ms, sys: 400 ms, total: 1.08 s\n", + "Wall time: 1.11 s\n" ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" }, - "id": "fjjt3YnYnabv", - "outputId": "05762f50-bfe1-4d23-9153-31431418c8e5" - }, - "execution_count": 17, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:47 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 51C P0 35W / 70W | 1923MiB / 15360MiB | 59% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:49 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 53C P0 59W / 70W | 2821MiB / 15360MiB | 86% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(391, 1) (461, 2)\n", - "CPU times: user 2.32 s, sys: 58.5 ms, total: 2.38 s\n", - "Wall time: 2.51 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"116374117927631468606\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 27 } + ], + "source": [ + "%%time\n", + "gg.chain([ n({'id': '116374117927631468606'})])._nodes" ] }, + { + "cell_type": "markdown", + "source": [ + "on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases" + ], + "metadata": { + "id": "e4ZchWvrBKdY" + } + }, { "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])\n", + " end1 = time.time()\n", + "\n", + " del gg_gdf\n", + " del out\n", + " T1 = end1-start1\n", + " # print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "(results_df.T)" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "oIouuORgnbcY", - "outputId": "f07abe4c-5137-4ee3-935a-afbb2c5eaa1e" + "id": "fTnU8MLr8tV5", + "outputId": "203eb5bf-9d95-4557-f35e-7ef2274424c5" }, - "execution_count": 18, + "execution_count": 28, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:50 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 52C P0 36W / 70W | 1925MiB / 15360MiB | 55% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:53 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P0 75W / 70W | 2825MiB / 15360MiB | 74% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(21767, 1) (28480, 2)\n", - "CPU times: user 3.04 s, sys: 63.6 ms, total: 3.1 s\n", - "Wall time: 3.25 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2 3 4\n", + "hops 1 2 3 4 5\n", + "CPU hop chain time (s) 33.7597 50.877 228.473 291.1332 327.8891\n", + "GPU hop chain time (s) 0.3082 0.6515 2.9645 4.1146 4.7598\n", + "n_notation speedup 109.5356 78.0912 77.0694 70.7561 68.8877" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
hops12345
CPU hop chain time (s)33.759750.877228.473291.1332327.8891
GPU hop chain time (s)0.30820.65152.96454.11464.7598
n_notation speedup109.535678.091277.069470.756168.8877
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.3082,\n \"max\": 109.5356,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.7597,\n 109.5356,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.6515,\n \"max\": 78.0912,\n \"num_unique_values\": 4,\n \"samples\": [\n 50.877,\n 78.0912,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 2.9645,\n \"max\": 228.473,\n \"num_unique_values\": 4,\n \"samples\": [\n 228.473,\n 77.0694,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4,\n \"max\": 291.1332,\n \"num_unique_values\": 4,\n \"samples\": [\n 291.1332,\n 70.7561,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4.7598,\n \"max\": 327.8891,\n \"num_unique_values\": 4,\n \"samples\": [\n 327.8891,\n 68.8877,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 28 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "and similarly for these hop operations -- **100x** speed increases" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "oNLZGjwInc85", - "outputId": "534097cf-4022-48cc-9419-a00c135f69e1" - }, - "execution_count": 19, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:53 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P0 36W / 70W | 1927MiB / 15360MiB | 54% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:58 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 56C P0 38W / 70W | 2907MiB / 15360MiB | 89% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(718640, 1) (2210961, 2)\n", - "CPU times: user 4.58 s, sys: 309 ms, total: 4.89 s\n", - "Wall time: 5.02 s\n" - ] - } - ] + "id": "80bs6Y5pBWb2" + } }, { "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + " start0 = time.time()\n", + " for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + " gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + " start1 = time.time()\n", + " for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + "\n", + " del start_nodes\n", + " del gg_gdf\n", + " del g2\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "(results_df.T)" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "ePqaeujMneX8", - "outputId": "ffd88fff-016e-4ac0-ecb9-fa06baca60f8" + "id": "N2-gDFod9vc3", + "outputId": "907da762-fae2-4caa-cdd2-e78e13b2f635" }, - "execution_count": 20, + "execution_count": 29, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:58 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 55C P0 37W / 70W | 1925MiB / 15360MiB | 59% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:57:10 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 60C P0 48W / 70W | 4325MiB / 15360MiB | 99% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3041556, 1) (47622917, 2)\n", - "CPU times: user 10.8 s, sys: 1.29 s, total: 12.1 s\n", - "Wall time: 12 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2 3 4\n", + "hops 1 2 3 4 5\n", + "CPU hop chain time (s) 19.6594 33.2538 64.8384 98.9693 147.4526\n", + "GPU hop chain time (s) 0.116 0.2583 0.8252 1.3544 1.9375\n", + "n_notation speedup 169.4189 128.7532 78.5772 73.071 76.103" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
hops12345
CPU hop chain time (s)19.659433.253864.838498.9693147.4526
GPU hop chain time (s)0.1160.25830.82521.35441.9375
n_notation speedup169.4189128.753278.577273.07176.103
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.116,\n \"max\": 169.4189,\n \"num_unique_values\": 4,\n \"samples\": [\n 19.6594,\n 169.4189,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.2583,\n \"max\": 128.7532,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.2538,\n 128.7532,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.8252,\n \"max\": 78.5772,\n \"num_unique_values\": 4,\n \"samples\": [\n 64.8384,\n 78.5772,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.3544,\n \"max\": 98.9693,\n \"num_unique_values\": 4,\n \"samples\": [\n 98.9693,\n 73.071,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.9375,\n \"max\": 147.4526,\n \"num_unique_values\": 4,\n \"samples\": [\n 147.4526,\n 76.103,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 29 } ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=6)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "PTBkoIVHnfzK", - "outputId": "5615ecd7-47ea-46ab-fd36-13bce4b3c787" - }, - "execution_count": 21, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:57:10 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 59C P0 38W / 70W | 1925MiB / 15360MiB | 44% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:57:38 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 68C P0 55W / 70W | 6445MiB / 15360MiB | 95% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3071927, 1) (117032738, 2)\n", - "CPU times: user 23.5 s, sys: 2.68 s, total: 26.2 s\n", - "Wall time: 28.2 s\n" - ] - } - ] + "kernelspec": { + "display_name": "Python 3", + "name": "python3" }, - { - "cell_type": "code", - "source": [], - "metadata": { - "id": "Ygc2nrkznlCu" - }, - "execution_count": null, - "outputs": [] + "language_info": { + "name": "python" } - ] + }, + "nbformat": 4, + "nbformat_minor": 0 } \ No newline at end of file