Skip to content

Conversation

@AStepanov25
Copy link
Contributor

@AStepanov25 AStepanov25 commented May 26, 2025

This PR is a continuation of this converstion: https://forum.dfinity.org/t/motoko-base-library-changes/39766/43

@AStepanov25 AStepanov25 requested a review from a team as a code owner May 26, 2025 16:05
@cla-idx-bot
Copy link

cla-idx-bot bot commented May 26, 2025

Dear @AStepanov25,

In order to potentially merge your code in this open-source repository and therefore proceed with your contribution, we need to have your approval on DFINITY's CLA.

If you decide to agree with it, please visit this issue and read the instructions there. Once you have signed it, re-trigger the workflow on this PR to see if your code can be merged.

— The DFINITY Foundation

@Kamirus
Copy link
Member

Kamirus commented May 27, 2025

Hey thanks for your contribution! I'd like to propose a simpler API that might be worth considering:

  • Add List.range(List<T>, Nat, Nat) : Iter<T> (similar to Array.range) returning a slice
  • These slices could be combined into a target vector by Iter.flatten |> List.fromIter or even Iter.concat if the number of vectors is known.

IMO concatSlices<T>(slices : [(List<T>, fromInclusive : Nat, toExclusive : Nat)]) : List<T> is too complex -- it does both slicing and concatenating.
Also, I'd use Iter instead of Array to avoid this unnecessary materialization to Array.
Also for List.concat:

  • I'd change it to List.join(Iter<List<T>>) : List<T> (similarly to Array.join)
  • And maybe also have List.flatten(List<List<T>>) : List<T> (similarly to Array.flatten)

If the performance of using Iter is comparable to your functions I'd prefer this simpler and more flexible solution.
What do you think?

@AStepanov25
Copy link
Contributor Author

AStepanov25 commented May 27, 2025

If the performance of using Iter is comparable to your functions...

We thought about iterators, the thing is, that performance is not comparable, because Iter generates linear amount of garbage, because ?T seems to be boxed. This is the reason values are implemented with unsafe_next_i function.

I'd change it to List.join(Iter<List>) : List (similarly to Array.join).

Yes, we can change to accepting iters, as there are not expected to be many.

And maybe also have List.flatten(List<List>) : List (similarly to Array.flatten)

Do we actually need this function? Isn't having both redundant.

@timohanke
Copy link

timohanke commented May 28, 2025

We can certainly create analogy to Array with join, flatten and concat (the latter for joining two lists). And flatten just a wrapper around join.

The performance of all of these hinges on the fact that the lengths of the input lists are known in advance (not necessarily the number of lists as in join). That's why, if we want "slice"-versions of join or concat, the argument has to be as it is in this PR. We cannot replace the "slice" argument (triple) by an Iter. Because in an Iter we lose the length information. So you can do that but you won't get the same performance.

This PR was meant to provide the best performing function. We utilize fast internal position iteration both for reading from the input lists and for writing into the new list. Plus we use more efficient pre-allocation because we know the number of new elements coming.

Here is another proposal to avoid the complex concatSlices. If we are looking for a function instead that is more of a "primitive" but that can achieve the same at the same performance then maybe we can expand on the add* family of functions. Currently we have add (single element), addMany (Iter), addRepeat (same element multiple times). We can introduce addFromArray, addFromList, addFromSlice, etc. to this family and those functions can internally use length information on how many elements are coming. Then addFromList(l1, l2) would be faster than addMany(l1, values(l2)).

If people really have to concatenate many slices (probably a rare application) then they can still do that with a few lines utilizing addFromSlice.

@Kamirus
Copy link
Member

Kamirus commented May 28, 2025

We thought about iterators, the thing is, that performance is not comparable, because Iter generates linear amount of garbage, because ?T seems to be boxed. This is the reason values are implemented with unsafe_next_i function.

@timohanke @AStepanov25 I wanted to compare the performance of using Iter vs concatSlices and your other primitives. Here are the results. Click on the ConcatSlices.bench.mo to expand the results.
Indeed Iter is twice as slow, but look at the last two rows. Note that in this scenario I just concat two slices, not an array of them. Check out the source of the benchmark bench/List/ConcatSlices.bench.mo.
A combination of Iter + count performs really well.
Also, the unsafe_next_i vs vanilla next is just marginally better.

@Kamirus
Copy link
Member

Kamirus commented May 28, 2025

If we are looking for a function instead that is more of a "primitive" but that can achieve the same at the same performance then maybe we can expand on the add* family of functions.

This is a good idea! But maybe we don't need the whole family of them.
Now we have add and addAll, we could add a variation that takes Iter and count : Nat and adds count elements from the Iter. It should trap if there is not enough elements. I'd call it addCount but I'm not happy with this name.

We should check how much faster a List, Array, VarArray, etc.. variants actually are. I'd avoid them, because each would probably need a range/slice variant. I'd just rely on:

  1. valuesFrom to get the start of the range
  2. addCount to get the end of the range

I was experimenting with starting a List with a 'capacity' to avoid the resizing and then filling the necessary elements. But this was comparable to the basic Iter case...

I've noticed the next_set : T -> () in the private iterators added in this PR. I was curious: maybe we should expose that, maybe these savings on the index calculation makes a difference, etc...
So I've compared next_set to List.put and it was 30% faster.
But then I've compared vanilla next to List.get (which is manually inlined) and the difference was 0.3%...
Benchamark is called List/Iteration.bench.mo. So maybe there is more value in pushing for more aggressive moc optimizations or maybe just enabling them, mops bench skips that by default

@timohanke
Copy link

timohanke commented May 28, 2025

Interesting benchmarks. Thanks!

I'd just rely on:

valuesFrom to get the start of the range
addCount to get the end of the range

I would like that kind of interface. I agree it's sufficient in practice and I hope/wish that it can be made performant.

Your implementation of addCount may not be allowed as it is. We have to check the details. @AStepanov25 will know. You are doing

list.blocks[list.blockIndex] := VarArray.tabulate<?T>(db_size, func _ = iter.next());

which is a cool trick because if iter.next() is null you just overwrite null with null and you are avoiding having to switch over iter.next(). However, doesn't this mean that if I provide a count value that is too high that I am allocating a certain amount of null space at the end of the List? You wrote

It should trap if there is not enough elements.

above, so maybe I am not reading the code correctly.

My point was that if it does continue to allocate datablocks filled with null at the end then that may cause some problems. I think the List implementation relies on the fact that there is at most one null datablock at the end. The grow and shrink logic is quite sensitive.

Having an UnsafeIter that traps is better because then we don't have to switch over next() because we want to trap anyway if we go too far in the iter. But unfortunately UnsafeIter is not an established thing. So we can't really use it in the exposed interface, can we?

So I've compared next_set to List.put and it was 30% faster.
But then I've compared vanilla next to List.get (which is manually inlined) and the difference was 0.3%...

That's probably because List.put has a more expensive boundary check which next_set can avoid. List.get doesn't need the same boundary check because it will automatically trap at out of boundary index. Hence unsafe_next has less of an advantage here.

The advantage in practice of next_over over List.put and unsafe_next over List.get can be higher than what your benchmarks shows. That's because in practice you can sometimes avoid the loop index with i += 1. Sometimes there is already some other loop index coming from the natural context. Then with next_set/unsafe_next you are fine, you don't need another position index. But with List.put/get you need to increment a second position index. You benchmark does not show that difference. Example: copy a chunk form one List into another List. With List.get + List.put you need to track and increment two position indices. With unsafe_next + next_set only one counter.

@AStepanov25
Copy link
Contributor Author

AStepanov25 commented May 29, 2025

What about such interface:

concatIters(iters : Iter<Iter>, lengths: ?Iter) : List

This method will leverage lengths for speed if they are available. Or there could be two methods, one with lengths and another without.

Edit: actually we need to know one sum length.

@AStepanov25
Copy link
Contributor Author

AStepanov25 commented May 29, 2025

Also we could break existing invariant that at the end there are no more than two totally empty data blocks and provide reserve function, then everything coud be done as fast as current implementation with values_from, reserve, addCount. The question is do we need all these functions alone?

Comment on lines 11 to 15
* Update code examples in doc comments (#224, #282, #303, #315).

## 0.5.0

* Add `concat` of slices function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining with the other unreleased changes:

Suggested change
* Update code examples in doc comments (#224, #282, #303, #315).
## 0.5.0
* Add `concat` of slices function.
* Update code examples in doc comments (#224, #282, #303, #315).
* Add `concat` of slices function (#317).

README.md Outdated
```toml
base = "0.14.4"
new-base = "0.4.0"
new-base = "0.5.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We will do this in a separate PR)

Suggested change
new-base = "0.5.0"
new-base = "0.4.0"

[package]
name = "new-base"
version = "0.4.0"
version = "0.5.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = "0.5.0"
version = "0.4.0"

/// Runtime: `O(sum_size)` where `sum_size` is the sum of the sizes of all slices.
///
/// Space: `O(sum_size)`
public func concatSlices<T>(slices : [(List<T>, fromInclusive : Nat, toExclusive : Nat)]) : List<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could refactor to use a new type NatSlice<T> = (T, fromInclusive : Nat, toExclusive : Nat) (defined in Types.mo) for reusability in other data structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean type NatSlice<T> = (List<T>, fromInclusive : Nat, toExclusive : Nat)?

We can do that "lazily", i.e. whenever new code that uses it arrives refactor it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was specifically thinking (T, ...) so we could use other data structures (e.g. [T]) in place of List<T>. Agreed that we can do this later if needed.

@Kamirus
Copy link
Member

Kamirus commented May 30, 2025

Your implementation of addCount may not be allowed as it is. We have to check the details....
above, so maybe I am not reading the code correctly.

@timohanke Sorry about this confusion, my code is just an experiment, a quick prototype without a proper check. Also the code duplication between addCount and repeatInternal is better to be avoided.
We cannot silently accept nulls as values, that's why I propose we should trap when there are not enough elements in the Iter.

Having an UnsafeIter that traps is better because then we don't have to switch over next() because we want to trap anyway if we go too far in the iter. But unfortunately UnsafeIter is not an established thing. So we can't really use it in the exposed interface, can we?

Probably not, but hopefully the difference between the safe and unsafe Iter is small enough (the diff between the last two rows in my benchmark seem to suggest that)

@AStepanov25
Copy link
Contributor Author

AStepanov25 commented Jun 1, 2025

I've update the issue #325.

Regarding concatSlices, we might not need it at all, while we need to support atomic methods suggested by @Kamirus either way. I think we are got caught into "premature optimization" trap with concatSlices.

In the issue I suggested to add more usual dynamic array methods.

@Kamirus
Copy link
Member

Kamirus commented Jun 2, 2025

Also we could break existing invariant that at the end there are no more than two totally empty data blocks and provide reserve function, then everything coud be done as fast as current implementation with values_from, reserve, addCount. The question is do we need all these functions alone?

@AStepanov25 The reserve function would expand the capacity of the list, adding nulls that don't count towards the size? If so then, to get the addCount functionality we would need to call add (or set?) in a loop. It would probably be slower than addCount.

valuesFrom is a simple way to get an iterator over a postfix of values, faster than List.values |> Iter.drop. It would have various uses (like in Map.valuesFrom) and synergize well with addCount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants