Releases: neo4j/graph-data-science
Graph Data Science 2.1.7
GDS 2.1.7 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Breaking Changes
- Link prediction pipeline training no longer accepts directed graphs. This is because the algorithm & ML techniques used by link prediction pipelines are only defined for undirected graphs.
Bug Fixes
- Fixed a bug in
modularityOptimization
could incorrectly update modularity values - Fixed a bug where
gds.restore
did not correctly read values wrapped in quotes
2.1.6
GDS 2.1.6 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Bug Fixes
- Fixed a bug where relationship types or node labels were not handled correctly when importing previously exported data via Apache Arrow.
- Fixed a bug where
gds.graphSage.[stream|write|mutate]
did not use the correct relationship weights when run with concurrency > 1.
Graph Data Science 2.1.5
GDS 2.1.5 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Improvements
- Better error handling for K-means
Graph Data Science 2.1.4
GDS 2.1.4 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Bug fixes
- Fixed a bug where Neo4j users with admin role could not see all graphs in the catalog on GDS enterprise.
- Fixed a bug in random graph generation where the resulting graph can end up with an incorrect relationship schema.
- Fixed a bug where
gds.graph.list
andgds.graph.drop
throw a NPE if the catalog contains graphs that have been created with Cypher Aggregation.
Graph Data Science 1.8.8
GDS 1.8.8 is compatible with Neo4j 4.1, 4.2, 4.3 and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5.
Breaking Changes
- The procedures
gds.features.useKernelTracker
andgds.features.useKernelTracker.reset
have been removed.
Bug fixes
- Fixed a bug in
gds.beta.randomWalk.stream
where configuring start nodes could lead to AIOOB exceptions. - Fix a bug in
gds.graph.export
where the configured database directory would not be respected. - Fixed a bug with running Triangle Count on filtered graphs.
- Fixed a bug in
gds.beta.graphSage
when usingactivationFunction: 'RELU
', where the training did not always compute the correct gradient. - Fixed a bug in
gds.louvain.stream
that occurrred when theconsecutiveIds
parameter was enabled. - Fixed a bug where Neo4j users with admin role could not see all graphs in the catalog on GDS enterprise.
Other Changes
- Updated version of 'com.google.protobuf' to 3.9.12. This fixes a potential Denial of Service issue (GHSA-wrvw-hg22-4m67).
Graph Data Science 2.1.2
GDS 2.1.2 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Bug fixes
- Fixed a bug where checking for business rules around running on a Neo4j cluster could cause the cluster to fail to start.
Graph Data Science 2.1.1
GDS 2.1.1 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Other Updates
- Fixed issue with publishing compatibility artifacts
Graph Data Science 2.0.5
GDS 2.0.5 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Bug Fixes
nodeFilter
would be ignored. GH 194RandomWalk
where unconsumed stream results could leave GDS in a state where no further operations were possiblegds.louvain.stream
which could arise when the consecutiveIds
parameter was enabled.Improvements
gds.beta.graph.project.subgraph
when comparing expressions with incompatible types and one of them is a literal expression.centralityDistribution
. Now we only skip the computation of the distribution but the centrality result is accessible.Graph Data Science 2.1.0
GDS 2.1.0 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Breaking Changes
- Removed the redundant information of parameter space and split config from the info of the models trained by
gds.beta.pipeline.[nodeClassification|linkPrediction].train
. The information is now accessible only via the Pipeline Catalog. - Removed the label parameter from
gds.graph.removeNodeProperties
. - Supported config parameters are
timeoutInSeconds
andconcurrency
New Features
- (Enterprise Only) Apache Arrow and Flight RPC can now be used to improve certain import and export tasks:
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
gds.alpha.graph.construct
from the GDS Python client - Export node, relationship, and graph properties directly via Arrow Flight RPC, for example by using the existing
stream*Properties
functionality from the GDS Python client - Flight RPC is secured with the same authorization and encryption that the Neo4j database is using
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
- New Algorithm: K-Means Clustering. Added the following procedures:
gds.alpha.kmeans.mutate
gds.alpha.kmeans.stats
gds.alpha.kmeans.stream
gds.alpha.kmeans.write
- New Algorithm: Leiden. Added the following procedures:
gds.alpha.leiden.mutate
gds.alpha.leiden.stats
gds.alpha.leiden.stream
gds.alpha.leiden.stream
- Added new similarity variant, Filtered Node Similarity, to alpha tier, accepting source and target node filters
gds.alpha.nodeSimilarity.filtered.mutate
gds.alpha.nodeSimilarity.filtered.stream
gds.alpha.nodeSimilarity.filtered.write
gds.alpha.nodeSimilarity.filtered.stats
- Added new similarity variant Filtered KNN to alpha tier, accepting source and target node filters
gds.alpha.knn.filtered.mutate
gds.alpha.knn.filtered.stream
gds.alpha.knn.filtered.write
gds.alpha.knn.filtered.stats
- Added new procedures for delta stepping:
gds.allShortestPaths.delta.stats
gds.allShortestPaths.delta.stats.estimate
- Added new procedures for BFS:
gds.bfs.stats
gds.bfs.stats.estimate
- Added Node Regression Pipelines with the following procedures
gds.alpha.pipeline.nodeRegression.create
gds.alpha.pipeline.nodeRegression.configureAutoTuning
gds.alpha.pipeline.nodeRegression.configureSplit
gds.alpha.pipeline.nodeRegression.addLinearRegression
gds.alpha.pipeline.nodeRegression.addRandomForest
gds.alpha.pipeline.nodeRegression.addNodeProperty
gds.alpha.pipeline.nodeRegression.selectFeatures
gds.alpha.pipeline.nodeRegression.train
gds.alpha.pipeline.nodeRegression.predict.stream
gds.alpha.pipeline.nodeRegression.predict.mutate
- Autotuning Support for Machine Learning Pipelines:
- Added new procedures
gds.alpha.pipeline.[nodeClassification|nodeRegression|linkPrediction].configureAutoTuning
. - Added syntax to specify ranges for parameters in
gds.alpha.pipeline.[linkPrediction|nodeClassification|nodeRegression].addRandomForest
,gds.beta.pipeline.[linkPrediction|nodeClassification].addLogisticRegression
, andgds.alpha.nodeRegression.addLinearRegression
- Added new procedures
- Additional Machine Learning Pipeline Functionality:
- Exposed
learningRate
for theLogisticRegression
models, which can be added usinggds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression
- Exposed
minLeafSize
forRandomForest
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
- Exposed
criterion
forRandomForestClassification
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
. Also added support for theENTROPY
impurity criterion. - Updated structure of
modelSelectionStats
yield ingds.beta.pipeline.[linkPrediction, nodeClassification].train
. - Support
OUT_OF_BAG_ERROR
metric ingds.beta.pipeline.[linkPrediction, nodeClassification].train
which applies only to RandomForest models. - Expose
batchesPerIteration
ingds.beta.graphSage.train
to configure the number of batches considered per iteration.
- Exposed
- Cypher Aggregation now accepts any INTEGER value for source and target nodes
- Added
ShardedIdMap
which adds support for external node ids ranging from0
toLong.MAX_VALUE
.- The id map is disabled by default and can be enabled via feature toggle
USE_SHARDED_ID_MAP
.
- The id map is disabled by default and can be enabled via feature toggle
- Added procedures for exporting graph properties to the alpha tier
gds.alpha.graph.streamGraphProperty
gds.alpha.graph.removeGraphProperty
- Exposed a new string config parameter
jobId
for graph projection and algorithm procedures, which allows for easier tracking of a job via e.g.gds.beta.listProgress
.
Bug fixes
- Fixed a bug in
gds.beta.pipeline.[nodeClassification|linkPrediction].addNodeProperty
wheregds.beta.graphSage.mutate
could not be added. - Fixed a bug where the procedures
gds.beta.pipeline.linkPrediction.predict.[mutate|stream]
threw an error when given the argumentinitialSampler
. - Fixed a bug with running Triangle Count on filtered graphs that could cause an ArrayIndexOutOfBounds Error.
- Fixed a bug where
graphSage.train
incorrectly reporteddidConverge
as false. - Fixed a bug in CollapsePath where a provided
nodeFilter
would be ignored (GH 194) - Fixed a bug in
gds.louvain.stream
when theconsecutiveIds
parameter was enabled. - Fixed a bug in RandomWalk where not consuming all stream results could lead to a state where GDS would become unable to run further procedures
Improvements
- When a query is failed by the memory guard, information is logged as well as sent to the user in the raised exception.
- Added new methods to Pregel contexts which allow translating between internal and original node id space.
- Machine learning pipelines
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
now incorporates memory usage of random forest training into account when applicable.gds.beta.pipeline.[nodeClassification|linkPrediction].predict.[mutate,stream,write].estimate
now take random forest prediction memory overhead- Improve early validation of graph and prediction pipeline in
gds.beta.pipeline.[nodeClassification|linkPrediction].predict
. - Improve memory estimation for
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
. - Improve memory estimation in
gds.beta.pipeline.linkPrediction.train.estimate
. - Add training method specific debug le...
2.1.0-Preview
GDS 2.1.0-preview is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Breaking Changes
- Removed the redundant information of parameter space and split config from the info of the models trained by
gds.beta.pipeline.[nodeClassification|linkPrediction].train
. The information is now accessible only via the Pipeline Catalog. - Removed the label parameter from
gds.graph.removeNodeProperties
. - Supported config parameters are
timeoutInSeconds
andconcurrency
New Features
- (Enterprise Only) Apache Arrow and Flight RPC can now be used to improve certain import and export tasks:
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
gds.alpha.graph.construct
from the GDS Python client - Export node, relationship, and graph properties directly via Arrow Flight RPC, for example by using the existing
stream*Properties
functionality from the GDS Python client - Flight RPC is secured with the same authorization and encryption that the Neo4j database is using
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
- New Algorithm: K-Means Clustering. Added the following procedures:
gds.alpha.kmeans.mutate
gds.alpha.kmeans.stats
gds.alpha.kmeans.stream
- New Algorithm: Leiden. Added the following procedures:
gds.alpha.leiden.mutate
gds.alpha.leiden.stats
Gds.alpha.leiden.stream
- Added new similarity variant Filtered Node Similarity to alpha tier, accepting source and target node filters
gds.alpha.nodeSimilarity.filtered.mutate
gds.alpha.nodeSimilarity.filtered.stream
gds.alpha.nodeSimilarity.filtered.write
- Added new similarity variant Filtered KNN to alpha tier, accepting source and target node filters
gds.alpha.knn.filtered.mutate
gds.alpha.knn.filtered.stream
- Added new procedures for delta stepping:
gds.allShortestPaths.delta.stats
gds.allShortestPaths.delta.stats.estimate
- Added new procedures for BFS:
Gds.bfs.stats
gds.bfs.stats.estimate
- Added Node Regression Pipelines with the following procedures
gds.alpha.pipeline.nodeRegression.create
gds.alpha.pipeline.nodeRegression.configureAutoTuning
gds.alpha.pipeline.nodeRegression.configureSplit
gds.alpha.pipeline.nodeRegression.addLinearRegression
gds.alpha.pipeline.nodeRegression.addRandomForest
gds.alpha.pipeline.nodeRegression.addNodeProperty
gds.alpha.pipeline.nodeRegression.selectFeatures
gds.alpha.pipeline.nodeRegression.train
gds.alpha.pipeline.nodeRegression.predict.stream
gds.alpha.pipeline.nodeRegression.predict.mutate
- Autotuning Support for Machine Learning Pipelines:
- Added new procedures
gds.alpha.pipeline.[nodeClassification|nodeRegression|linkPrediction].configureAutoTuning
. - Added syntax to specify ranges for parameters in
gds.alpha.pipeline.[linkPrediction|nodeClassification|nodeRegression].addRandomForest
,gds.beta.pipeline.[linkPrediction|nodeClassification].addLogisticRegression
, andgds.alpha.nodeRegression.addLinearRegression
- Added new procedures
- Additional Machine Learning Pipeline Functionality:
- Exposed
learningRate
for theLogisticRegression
models, which can be added usinggds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression
- Exposed
minLeafSize
forRandomForest
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
- Exposed
criterion
forRandomForestClassification
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
. Also added support for theENTROPY
impurity criterion. - Updated structure of
modelSelectionStats
yield ingds.beta.pipeline.[linkPrediction, nodeClassification].train
. - Support
OUT_OF_BAG_ERROR
metric ingds.beta.pipeline.[linkPrediction, nodeClassification].train
which applies only to RandomForest models. - Expose
batchesPerIteration
ingds.beta.graphSage.train
to configure the number of batches considered per iteration.
- Exposed
- Cypher Aggregation now accepts any INTEGER value for source and target nodes
- Added
ShardedIdMap
which adds support for external node ids ranging from0
toLong.MAX_VALUE
.- The id map is disabled by default and can be enabled via feature toggle
USE_SHARDED_ID_MAP
.
- The id map is disabled by default and can be enabled via feature toggle
- Added procedures for exporting graph properties to the alpha tier
gds.alpha.graph.streamGraphProperty
gds.alpha.graph.removeGraphProperty
- Exposed a new string config parameter
jobId
for graph projection and algorithm procedures, which allows for easier tracking of a job via e.g.gds.beta.listProgress
.
Bug fixes
- Fixed a bug in
gds.beta.pipeline.[nodeClassification|linkPrediction].addNodeProperty
wheregds.beta.graphSage.mutate
could not be added. - Fixed a bug where the procedures
gds.beta.pipeline.linkPrediction.predict.[mutate|stream]
threw an error when given the argumentinitialSampler
. - Fixed a bug with running Triangle Count on filtered graphs that could cause an ArrayIndexOutOfBounds Error.
- Fixed a bug where
graphSage.train
incorrectly reporteddidConverge
as false. - Fixed a bug in CollapsePath where a provided
nodeFilter
would be ignored. - Fixed a bug in
gds.louvain.stream
when theconsecutiveIds
parameter was enabled. - Fixed a bug in RandomWalk where not consuming all stream results could lead to a state where GDS would become unable to run further procedures
Improvements
- When a query is failed by the memory guard, information is logged as well as sent to the user in the raised exception.
- Machine learning pipelines
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
now incorporates memory usage of random forest training into account when applicable.gds.beta.pipeline.[nodeClassification|linkPrediction].predict.[mutate,stream,write].estimate
now take random forest prediction memory overhead- Improve early validation of graph and prediction pipeline in
gds.beta.pipeline.[nodeClassification|linkPrediction].predict
. - Improve memory estimation for
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
. - Improve memory estimation in
gds.beta.pipeline.linkPrediction.train.estimate
. - Add training method specific debug level logging during the model selection phase of
gds.beta.pipeline.linkPrediction.train
,gds.beta.pipeline.nodeClassification.train
andgds.alpha.pipeline.nodeRegression.train
. - Improved logging in Link Prediction and Node Classification training.
- Reduced computational complexity and constant overhead of random forest training, added via
gds.alpha.pipeline[linkPrediction|nodeClassification].addRandomFor...