askrene: handle maxparts better. #8688

rustyrussell · 2025-11-13T23:33:30Z

This is the algorithm recommended by @Lagrang3 . Much is neatening and tweaking existing code, though.

Removal of unnecessary caching.
Simplify sorting code (sort array of flows directly, don't use index arrays)
make increase_flows() more general, so it can be used for > 2% increase.
Use increase_flows() to top up the largest-capacity flows if we hit maxparts.

I added amount_msat_accumulate for the "a+=b" case, but I was struggling with a name for the subtractive equivalent. After some prompting, ChatGPT suggested deduct. Signed-off-by: Rusty Russell <[email protected]>

This is not worth optimizing that I can see. Using a non-debug build I get the following times for tests/test_askrene.py::test_real_data Before: 143 seconds After: 141 seconds. Signed-off-by: Rusty Russell <[email protected]>

We've already freed the working_ctx, and the fail path does that again. Signed-off-by: Rusty Russell <[email protected]>

It can only fail on overflow, but if it did, the fail path frees working_ctx and returns "error_message". Signed-off-by: Rusty Russell <[email protected]>

Make it calculate on demand. This will be useful when we call it from elsewhere. Signed-off-by: Rusty Russell <[email protected]>

We don't need the indexes array, we can use this directly. We still set up the indexes array (for now) after we call this. Signed-off-by: Rusty Russell <[email protected]>

This removes the index array from code after increase_flows()m, so we use the flows array directly. The next step will be to make increase_flows() use the flows array, and remove the index array indirection entirely. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

We don't need to convert to strings, we can compare directly. This removes the final use of the index arrays. This of course changes the order of returned routes, which alters test_real_biases, since that biases against the final channel in the *first* route. Took me far too long to diagnose that! Signed-off-by: Rusty Russell <[email protected]>

We added _noidx versions of the sort functions, but now they're the only ones, we can rename them to the old names. Signed-off-by: Rusty Russell <[email protected]>

Rewrite it, so it properly takes into account interactions between flows by using reservations. Signed-off-by: Rusty Russell <[email protected]>

For 1, we use single-path. For 0, reject. Signed-off-by: Rusty Russell <[email protected]>

plugins/askrene/refine.c

Lagrang3

1. Removal of unnecessary caching.

Yes. Caching at this phase was overzealous, when in fact we spend most of the time in doing MCF/

2. Simplify sorting code (sort array of flows directly, don't use index arrays)

Indexing was a necessity after using cached channel information.

3. make increase_flows() more general, so it can be used for > 2% increase.

I like it. It is preferred to the previous approach in my opinion.

Looping and remove 1/n of the shortage at every iteration worries me a little bit because
the shortage amount decays exponentially but the factor decreases the more flows there are
and we won't stop until 10sats.
For example, it takes 7 iterations to reduce some shortage to 1% of its initial value if
there are 2 paths. 44 iterations for 10 paths and 459 iterations for 100 paths.
Also the complexity is the number of iterations times the number of paths, hence
2 * 7 = 14
10 * 44 = 440
100 * 459 = 45900
respectively.
In practice it will be ok since we don't expect too many paths
or the shortage to be too large.

4. Use increase_flows() to top up the largest-capacity flows if we hit maxparts.

I strongly believe that reduce_num_flows shoudl sort flows based on delivery amount
and keep the largest delivery amount first.

rustyrussell · 2025-11-16T05:01:54Z

Here's the diff for the reduce_flows:

diff --git a/plugins/askrene/refine.c b/plugins/askrene/refine.c
index 2ee3855e2f..118460eb9d 100644
--- a/plugins/askrene/refine.c
+++ b/plugins/askrene/refine.c
@@ -588,36 +598,37 @@ double flows_probability(const tal_t *ctx, struct route_query *rq,
 	return probability;
 }
 
-/* Compare flows by remaining capacity (note: they are not reserved, so this
- * capacity includes current capacity. */
-static int reverse_cmp_capacity(struct flow *const *fa, struct flow *const *fb,
-				struct route_query *rq)
-{
-	struct amount_msat amount_a, amount_b;
-
-	amount_a = flow_max_deliverable(rq, *fa);
-	amount_b = flow_max_deliverable(rq, *fb);
-	if (!amount_msat_deduct(&amount_a, (*fa)->delivers))
-		abort();
-	if (!amount_msat_deduct(&amount_b, (*fb)->delivers))
-		abort();
-
-	if (amount_msat_eq(amount_a, amount_b))
-		return 0;
-	if (amount_msat_greater(amount_a, amount_b))
-		return -1;
-	return 1;
-}
-
 const char *reduce_num_flows(const tal_t *ctx,
 			     const struct route_query *rq,
 			     struct flow ***flows,
 			     struct amount_msat deliver,
 			     size_t num_parts)
 {
-	/* Keep the largest-capacity flows */
+	/* Keep the largest flows (not as I originally implemented, the largest
+	 * capacity flows).  Here's Lagrang3's analysis:
+	 *
+	 * I think it is better to keep the largest-deliver flows.  If we only
+	 * go for the highest capacity we may throw away the low cost benefits
+	 * of the MCF.
+
+	 * Hypothetical scenario: MCF finds 3 flows but maxparts=2,
+	 * flow 1: deliver=10, cost=0, capacity=0
+	 * flow 2: deliver=7, cost=1, capacity=5
+	 * flow 3: deliver=1, cost=10, capacity=100
+	 *
+	 * It is better to keep flows 1 and 2 by accomodating 1 more unit of
+	 * flow in flow2 at 1 value expense (per flow), than to keep flows 2 and
+	 * 3 by accomodating 5 more units of flow in flow2 at cost 1 and 5 in
+	 * flow3 at cost 100.
+	 *
+	 * The trade-off is: if we prioritize the delivery value already
+	 * computed by MCF then we find better solutions, but we might fail to
+	 * find feasible solutions sometimes. If we prioritize capacity then we
+	 * generally find bad solutions though we find feasibility more often
+	 * than the alternative.
+	 */
 	size_t orig_num_flows = tal_count(*flows);
-	asort(*flows, orig_num_flows, reverse_cmp_capacity, cast_const(struct route_query *, rq));
+	asort(*flows, orig_num_flows, revcmp_flows, NULL);
 	tal_resize(flows, num_parts);
 
 	if (!increase_flows(rq, *flows, deliver, -1.0))

Now we simply call it at the end. We need to check it hasn't violated fee maxima, but otherwise it's simple. Signed-off-by: Rusty Russell <[email protected]> Changelog-Fixed: Plugins: `askrene` now handles limits on number of htlcs much more gracefully.

Signed-off-by: Rusty Russell <[email protected]>

plugins/askrene/refine.c

@Lagrang3

Pointed out by @Lagrang3; he's right, while it's a temporary leak the way we use flows, it's still a trap. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added 12 commits November 13, 2025 15:38

common: add amount_msat_deduct / amount_msat_deduct_sub.

13a2782

I added amount_msat_accumulate for the "a+=b" case, but I was struggling with a name for the subtractive equivalent. After some prompting, ChatGPT suggested deduct. Signed-off-by: Rusty Russell <[email protected]>

askrene: remove overzealous cache of channel_data.

c0e3fde

This is not worth optimizing that I can see. Using a non-debug build I get the following times for tests/test_askrene.py::test_real_data Before: 143 seconds After: 141 seconds. Signed-off-by: Rusty Russell <[email protected]>

askrene: fix error path if we fail sanity checks.

bccf7f3

We've already freed the working_ctx, and the fail path does that again. Signed-off-by: Rusty Russell <[email protected]>

askrene: fix use-after-free if remove_htlc_min_violations fails.

3661628

It can only fail on overflow, but if it did, the fail path frees working_ctx and returns "error_message". Signed-off-by: Rusty Russell <[email protected]>

askrene: remove max_deliverable cache from increase_flows.

90b6d78

Make it calculate on demand. This will be useful when we call it from elsewhere. Signed-off-by: Rusty Russell <[email protected]>

askrene: use flows array directly in remove_excess.

0ff499b

We don't need the indexes array, we can use this directly. We still set up the indexes array (for now) after we call this. Signed-off-by: Rusty Russell <[email protected]>

askrene: make increase_flows() use the raw flows array.

d19e0ab

Signed-off-by: Rusty Russell <[email protected]>

askrene: clean up renamed functions.

12c4f16

We added _noidx versions of the sort functions, but now they're the only ones, we can rename them to the old names. Signed-off-by: Rusty Russell <[email protected]>

askrene: make increase_flows function more generic.

90cfa2e

Rewrite it, so it properly takes into account interactions between flows by using reservations. Signed-off-by: Rusty Russell <[email protected]>

askrene: handle maxparts parameter values 1 and 0.

ccbdea3

For 1, we use single-path. For 0, reject. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added this to the v25.12 milestone Nov 13, 2025

rustyrussell requested a review from Lagrang3 November 13, 2025 23:33

rustyrussell mentioned this pull request Nov 14, 2025

xpay phoenix detect #8537

Merged