Commit aaf4136
committed
Fix recall when all vectors score the same
It is possible to insert non-equal vectors
which score the same by the provided similarity measure.
E.g. vectors (0.1, 0.1), (0.2, 0.2), (0.3, 0.3) all
are really the same point under cosine metric
and any pair of those would score 1.0 similarity.
This edge case caused some serious issues with graph
connectivity and the queries returned at most 33 nodes,
even if for large graphs.
This PR fixes it by improving fairness of node selection
when nodes score the same. This is achieved by a
small modification to how node ids are encoded
in the NodeQueue. When nodes score the same, they were compared
by node ids, which always preferred the nodes added earlier.
That created a huge bias. If we reverse the bits
of node ids, now this shuffles their order and breaks the
systematic bias towards the older nodes.
Another part of the fix is making sure nodes with the same
score don't block backlinks to be formed. By placing new neighbours
before the neighbours with the same score, we're giving
them a chance to be linked to. It will drop some other neighbor
from the list, but considering it was present in the graph for some
time already, it is much more likely to be already well connected.1 parent 6517870 commit aaf4136
File tree
4 files changed
+63
-20
lines changed- jvector-base/src/main/java/io/github/jbellis/jvector/graph
- jvector-tests/src/test/java/io/github/jbellis/jvector/graph
4 files changed
+63
-20
lines changedLines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
176 | | - | |
| 176 | + | |
| 177 | + | |
177 | 178 | | |
178 | 179 | | |
179 | 180 | | |
| |||
282 | 283 | | |
283 | 284 | | |
284 | 285 | | |
285 | | - | |
| 286 | + | |
286 | 287 | | |
287 | 288 | | |
288 | 289 | | |
289 | 290 | | |
290 | | - | |
| 291 | + | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
| |||
Lines changed: 6 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
104 | | - | |
| 103 | + | |
105 | 104 | | |
106 | 105 | | |
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
110 | | - | |
111 | | - | |
| 109 | + | |
| 110 | + | |
112 | 111 | | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
116 | 115 | | |
117 | 116 | | |
118 | | - | |
| 117 | + | |
119 | 118 | | |
120 | 119 | | |
121 | 120 | | |
| |||
124 | 123 | | |
125 | 124 | | |
126 | 125 | | |
127 | | - | |
| 126 | + | |
128 | 127 | | |
129 | 128 | | |
130 | 129 | | |
131 | 130 | | |
132 | 131 | | |
133 | 132 | | |
134 | 133 | | |
135 | | - | |
| 134 | + | |
136 | 135 | | |
137 | 136 | | |
138 | 137 | | |
| |||
Lines changed: 12 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
| 71 | + | |
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
| 75 | + | |
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
| 79 | + | |
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
104 | 105 | | |
| 106 | + | |
105 | 107 | | |
106 | 108 | | |
107 | 109 | | |
| |||
181 | 183 | | |
182 | 184 | | |
183 | 185 | | |
184 | | - | |
| 186 | + | |
185 | 187 | | |
186 | 188 | | |
187 | 189 | | |
| |||
Lines changed: 41 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
700 | 700 | | |
701 | 701 | | |
702 | 702 | | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
703 | 744 | | |
704 | 745 | | |
705 | 746 | | |
| |||
0 commit comments