-
Notifications
You must be signed in to change notification settings - Fork 27
initial work on flattening all immutable tuples into sequence containers #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
More progress: the quickhull correctness issue seems to have been a red herring. Previously, I was compiling MaPLe with (The correctness issue seemed to be due to problems with As of now, flattening seems to be working correctly. The results are really promising. Lots more testing will be needed, and the issue with CAS is going to be difficult (and interesting!) to solve. |
This is a substantial change which adds new types to the compiler:
array_flat and vector_flat. These are identical to their basis
counterparts (array and vector), but with a different memory layout:
elements stored into an array_flat have their outermost immutable
tuples flattened.
These come with associated structures that are nearly identical to
the array and vector basis library structures, only with the types
'a array replaced with 'a array_flat, etc.
The new structures and types are available at the source level by including:
$(SML_LIB)/basis/mpl.mlb
which provides:
structure MPL.ArrayFlat: ARRAY_FLAT
structure MPL.ArrayFlatSlice: ARRAY_FLAT_SLICE
structure MPL.VectorFlat: VECTOR_FLAT
structure MPL.VectorFlatSlice: VECTOR_FLAT_SLICE
These structures provide the programmer with more control over memory
representations by eliminating intermediate allocations and indirections.
For example, the source-level type
(Int64.int * (Real64.real * string) * Int64.int) MPL.ArrayFlat.array
is represented as an array where each element is 32 bytes, inlined:
8 bytes 8 bytes 8 bytes 8 bytes
+-----------+-------------+-----------+-------------+
... | Int64.int | Real64.real | Int64.int | (pointer *) | ...
+-----------+-------------+-----------+----------|--+
^ |
| v
element at index i starts at offset 32*i heap-allocated string
(Note that the string is still heap-allocated. Flattening only flattens
outer-most tuples. Also, due to MLton's GC model, all pointer data is
moved to the end.)
The performance advantages are significant. See MaPLe pr #223
(#223) for more info.
The example `nn-flat` is nearly identical to `nn`, except that the input sequence of 2D points is now represented as `MPL.ArrayFlat.array` instead of a regular `array`. The diff between the two benchmarks is very small, requiring modifications to only a few lines to pick the new array type at the source level. The performance results speak for themselves: $ cd examples $ make nn nn-flat $ bin/nn @mpl procs 8 -- -N 10000000 N 10000000 generated input in 0.1316s built quadtree in 0.8173s found all neighbors in 1.4025s ... $ bin/nn-flat @mpl procs 8 -- -N 10000000 N 10000000 generated input in 0.0277s # 5x improvement (!) built quadtree in 0.5001s # 60% improvement found all neighbors in 1.0248s # 40% improvement
|
These newest commits take a slightly different approach: rather than flattening all sequences (which is unsound in general), we instead introduce new types for flattened arrays and vectors. Notably, the new flattened types do not support CAS. |
|
Inside the compiler proper, the new approach is to parameterize the primitive array and vector types by a "layout", which currently is only |
Introduces an SSA2 pass called
flattenIntoSequences, implemented withinmlton/ssa/flatten-into-sequences.fun.The idea in the
flattenIntoSequencespass is to force flattening of immutable tuples into sequence containers. For example the SSA2 type((real64, word32) tuple, word8) tuple mut sequencewould be rewritten to(real64 mut, word32 mut, word8 mut) sequence, generating the appropriate compensation code at sequence loads and stores.When it works, the performance benefits are significant.
This is an attempt to address a problem with the
deepFlattenpass, which does not always succeed at flattening. The specifics are a bit mysterious. We will need to more closely investigate where (and why)deepFlattenchooses not to flatten into sequences.One issue with
flattenIntoSequencesat the moment is that it blindly flattens, which may not be correct in all cases, e.g., for primitive CAS operations on tuples.Current status
I've found at least one example (a quickhull benchmark) which is not compiling correctly. More investigation needed...
That being said, the
nnexample seems to be working correctly, with significant performance improvements (measurements taken on my Mac M2, 2022):