GDS 1.4 Preview
Pre-release
Pre-release
·
19637 commits
to master
since this release
Breaking changes
- Removed sparsity parameter from
gds.alpha.randomProjection.* - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Renamed
embeddingSizeparameter toembeddingDimensionfor fastRP, GraphSAGE and Node2Vec. - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Default parameters for
gds.fastRPhave changed on the following configuration parameters:iterationWeightsnow has default[0.0, 1.0, 1.0]normalizeL2has been removed and its effect is always applied
- Removed alpha procedures for GraphSage (replaced with
betatier, see New Features section)gds.alpha.graphSage.streamgds.alpha.graphSage.write
- GraphSage no longer directly calculates embeddings, instead it has been split into
train(to generate a named model) andwrite, mutate, andstreamto apply the model predictions to your data. - Due to the creation of a
trainmode for graph sage, the following configuration parameters were moved:embeddingSize- moved as configuration parameter ofgds.beta.graphSage.trainaggregator- moved as configuration parameter ofgds.beta.graphSage.trainactivationFunction- moved as configuration parameter ofgds.beta.graphSage.trainsampleSizes- moved as configuration parameter ofgds.beta.graphSage.trainnodePropertyNames- moved as configuration parameter ofgds.beta.graphSage.traintolerance- moved as configuration parameter ofgds.beta.graphSage.trainlearningRate- moved as configuration parameter ofgds.beta.graphSage.trainepochs- moved as configuration parameter ofgds.beta.graphSage.trainmaxIterations- moved as configuration parameter ofgds.beta.graphSage.trainsearchDepth- moved as configuration parameter ofgds.beta.graphSage.trainnegativeSampleWeight- moved as configuration parameter ofgds.beta.graphSage.traindegreeAsProperty- moved as configuration parameter ofgds.beta.graphSage.train
gds.beta.graphSage.streamprocedure now requiresmodelNameconfiguration parameter.gds.beta.graphSage.writeprocedure requiresmodelNameconfiguration parameter.- Removed
startLossandepochLossesfrom the result columns ofgds.beta.graphSage.write. - Added the graph create config as a return field to the train procedure, affecting
gds.beta.graphSage.train - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Removed configuration parameter
maxCostfromgds.alpha.bfs/dfs. - Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
- Removed
degreeDistributionfromgds.graph.dropreturn columns. gds.pageRanknow respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.- Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.
New features
- Promote GraphSage to
betatier and added support for inductive models with thetrainmode- This adds procedures
gds.beta.graphSage.mutategds.beta.graphSage.mutate.estimategds.beta.graphSage.streamgds.beta.graphSage.stream.estimategds.beta.graphSage.traingds.beta.graphSage.train.estimategds.beta.graphSage.writegds.beta.graphSage.write.estimate
- And removes alpha procedures
gds.alpha.graphSage.streamgds.alpha.graphSage.write
- This adds procedures
- GraphSage supports relationship weights, driven by
relationshipWeightProperty - GraphSage supports node labels via
projectedFeatureSize - Introduced the model catalog to manage trained models, including:
gds.beta.model.exists- a procedure to check if a model exists in the catalogGds.beta.model.list- list all available modelsgds.beta.model.drop- removes a model from the catalog
- The Random Projection algorithm has been promoted to the product tier and we have added:
gds.fastRP.statsgds.fastRP.mutategds.fastRP.estimate- Added procedures for
statsandmutatemode, as well as,estimatesfor all modes.
- FastRP has been extended to support relationship weights and directions
- FastRP supports integer configuration for iteration weights.
- We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
gds.beta.fastRPExtended.mutategds.beta.fastRPExtended.streamgds.beta.fastRPExtended.statsgds.beta.fastRPExtended.writegds.beta.fastRPExtended.mutate.estimategds.beta.fastRPExtended.stream.estimategds.beta.fastRPExtended.stats.estimategds.beta.fastRPExtended.write.estimate
- We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier
gds.beta.knn.mutateandgds.beta.knn.mutate.estimategds.beta.knn.statsandgds.beta.knn.stats.estimategds.beta.knn.streamandgds.beta.knn.stream.estimategds.beta.knn.writeandgds.beta.knn.write.estimate- The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
- Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in
computemethods. - Split context into Init- and ComputeContext and simplified API.
- Removed
K1ColoringExamplestandalone project. - Added
pregel-bootstrapstandalone project. - Added
pregel-examplesmodule.
- Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
- Added
densityproperty to the output of graph ingraph.list. - Added a
failIfMissingflag togds.graph.drop
Bug fixes
- Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
- Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
gds.alpha.closenessgds.alpha.closeness.harmonicgds.alpha.allShortestPaths
- Weakly connected components:
- Fixed a bug in WCC where
componentCountwould be negative when the graph is empty. - Fixed a regression where WCC could run more slowly with increased concurrency.
- Fixed a bug in WCC where
- Fixed bugs in Louvain:
communityCountis no longer negative when the graph is empty.- changes to
maxIterationsare no longer ignored.
- Fixed a bug in LabelPropagation where
communityCountwould be negative when the graph is empty. - Fixed a bug in
gds.graph.exportwhere at most one relationship property per relationship type would be exported. - Graph loading:
- Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
- Fixed bug in graph creation which could cause an AIOOB exception during node loading.
- The
readConcurrencyconfig parameter can no longer be overwritten by theconcurrencyparam when it is explicitly set in an implicit graph creation config
- Fixed a bug in memory estimation of large anonymous fictitious graphs.
- Fixed bug in
gds.alpha.dfs/bfs, where the algorithm did not terminate for graphs containing loops. - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
- Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
- Similarity algorithms:
- Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
- Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
- Fixed a bug where community statistic computation could overflow for large community ids.
- Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
- Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
- Fixed a bug that affected
gds.triangleCount()andgds.alpha.triangles()where not all triangles would be counted under certain conditions. - Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.
Improvements
gds.fastRPnow accepts integer iterationWeights- If
graphSage.trainis run on a graph without relationships, GDS now fails gracefully with an appropriate error message - Added validation that properties used by GraphSage exist on graph
- Added validation for <code>embeddingSize</code>>=1
- Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
- Progress logging:
- We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in larger steps if there are fewer than 100 batches. For example, if there are 50 batches, completing one batch means 2% progress, so it would log in steps of 2.
- Decreased the logging frequency when running with a high concurrency.
- Added
postProcessingMillistogds.localClusteringCoefficientandgds.triangleCountfor modes:mutate,write,stats- It is always zero for now, but this is a standard result column for these modes
- Parallelized computation of result statistics for the following community detection procedures:
gds.wcc.write,gds.wcc.mutateandgds.wcc.statsgds.louvain.write,gds.louvain.mutateandgds.louvain.statsgds.labelPropagation.write,gds.labelPropagation.mutateandgds.labelPropagation.statsgds.beta.modularityOptimization.writeandgds.beta.modularityOptimization.mutategds.alpha.scc.write
- Add graph schema to the result columns of
gds.model.listandgds.model.drop - Validate property existence (e.g.
seedProperty) when running algorithms on Cypher projections. - Improved memory estimation for
*node projections. - Added validation that properties used by GraphSage exist on graph
- In-memory graphs in multidatabase:
- When in-memory graphs are created, they are now associated with the database in use during creation time to prevent errors when running in a multi-database environment.
gds.graph.info()returns the database name the graph has been created on.- Named graphs can only be used on the database they have been created on.