Skip to content

Conversation

@MattScicluna
Copy link
Contributor

Changed default knn search algo from "ball_tree" to "auto".

When we pass "auto", if number of input dimensions in >20, then sklearn selects brute force.

We pass 50 or 100D PCA into PHATE.

It turns out that for "ball_tree" and "kd tree" become slow when the number of features is >20:
https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms

Here are results running PHATE on AMD EPYC 7543 32-Core Processor with 256GB RAM requested.
PHATE fitted on 100K sklearn.datasets.make_blobs.

Before:

Set graphtools logging to DEBUG
Running PHATE on 100000 observations and 50 variables.
Calculating graph and diffusion operator...
  Building knn graph with landmarks
  Initializing [<class 'graphtools.graphs.kNNGraph'>, <class 'graphtools.graphs.LandmarkGraph'>] with arguments bandwidth_scale='1.0', knn_max='None', kernel_symm='+', n_pca='None', verbose='2', random_state='42', knn='5', bandwidth='None', thresh='0.0001', distance='euclidean', rank_threshold='None', initialize='True', anisotropy='0', n_jobs='-1', theta='None', decay='40', n_landmark='2000', n_svd='100'
  Initializing kernel...
  Calculating KNN search...
  Calculated KNN search in 63.50 seconds.
  Calculating affinities...
    search_knn = 36; 19599 remaining
    search_knn = 216; 0 remaining
  Calculated affinities in 14.89 seconds.
  Using addition symmetrization.
Calculated graph and diffusion operator in 78.51 seconds.
Calculating landmark operator...
  Calculating SVD...
  Calculated SVD in 12.77 seconds.
  Calculating KMeans...
  Calculated KMeans in 3.41 seconds.
Calculated landmark operator in 18.29 seconds.

With change:

Set graphtools logging to DEBUG
Running PHATE on 100000 observations and 50 variables.
Calculating graph and diffusion operator...
  Building knn graph with landmarks
  Initializing [<class 'graphtools.graphs.kNNGraph'>, <class 'graphtools.graphs.LandmarkGraph'>] with arguments random_state='42', rank_threshold='None', distance='euclidean', decay='40', n_jobs='-1', knn_max='None', bandwidth='None', kernel_symm='+', initialize='True', bandwidth_scale='1.0', knn='5', thresh='0.0001', anisotropy='0', theta='None', verbose='2', n_pca='None', n_svd='100', n_landmark='2000'
  Initializing kernel...
  Calculating KNN search...
  Calculated KNN search in 6.19 seconds.
  Calculating affinities...
    search_knn = 36; 19599 remaining
    search_knn = 216; 0 remaining
  Calculated affinities in 2.20 seconds.
  Using addition symmetrization.
Calculated graph and diffusion operator in 8.51 seconds.
Calculating landmark operator...
  Calculating SVD...
  Calculated SVD in 13.16 seconds.
  Calculating KMeans...
  Calculated KMeans in 3.32 seconds.
Calculated landmark operator in 18.46 seconds.

@siddharthviswanath siddharthviswanath merged commit ec5e9b1 into KrishnaswamyLab:master Jun 21, 2025
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants