Skip to content

Conversation

@shwestrick
Copy link
Collaborator

Introduces an SSA2 pass called flattenIntoSequences, implemented within mlton/ssa/flatten-into-sequences.fun.

The idea in the flattenIntoSequences pass is to force flattening of immutable tuples into sequence containers. For example the SSA2 type ((real64, word32) tuple, word8) tuple mut sequence would be rewritten to (real64 mut, word32 mut, word8 mut) sequence, generating the appropriate compensation code at sequence loads and stores.

When it works, the performance benefits are significant.

This is an attempt to address a problem with the deepFlatten pass, which does not always succeed at flattening. The specifics are a bit mysterious. We will need to more closely investigate where (and why) deepFlatten chooses not to flatten into sequences.

One issue with flattenIntoSequences at the moment is that it blindly flattens, which may not be correct in all cases, e.g., for primitive CAS operations on tuples.

Current status

I've found at least one example (a quickhull benchmark) which is not compiling correctly. More investigation needed...

That being said, the nn example seems to be working correctly, with significant performance improvements (measurements taken on my Mac M2, 2022):

$ bin/nn @mpl procs 4 -- -N 10000000   # with new pass
N 10000000
generated input in 0.0303s
built quadtree in 0.8498s
found all neighbors in 1.2261s
...

$ bin/nn.sysmpl @mpl procs 4 -- -N 10000000   # without new pass
N 10000000
generated input in 0.1686s
built quadtree in 0.9750s
found all neighbors in 1.6588s
...

@shwestrick
Copy link
Collaborator Author

More progress: the quickhull correctness issue seems to have been a red herring. Previously, I was compiling MaPLe with make smlnj-mlton, but when I switched to standard make the correctness issue went away.

(The correctness issue seemed to be due to problems with real arithmetic. We may need to investigate this separately; perhaps some deeper issue with the 'make smlnj-mlton` build target)

As of now, flattening seems to be working correctly. The results are really promising.

Lots more testing will be needed, and the issue with CAS is going to be difficult (and interesting!) to solve.

This is a substantial change which adds new types to the compiler:
array_flat and vector_flat. These are identical to their basis
counterparts (array and vector), but with a different memory layout:
elements stored into an array_flat have their outermost immutable
tuples flattened.

These come with associated structures that are nearly identical to
the array and vector basis library structures, only with the types
'a array replaced with 'a array_flat, etc.

The new structures and types are available at the source level by including:

  $(SML_LIB)/basis/mpl.mlb

which provides:

  structure MPL.ArrayFlat: ARRAY_FLAT
  structure MPL.ArrayFlatSlice: ARRAY_FLAT_SLICE
  structure MPL.VectorFlat: VECTOR_FLAT
  structure MPL.VectorFlatSlice: VECTOR_FLAT_SLICE

These structures provide the programmer with more control over memory
representations by eliminating intermediate allocations and indirections.
For example, the source-level type
  (Int64.int * (Real64.real * string) * Int64.int) MPL.ArrayFlat.array
is represented as an array where each element is 32 bytes, inlined:

         8 bytes      8 bytes      8 bytes      8 bytes
      +-----------+-------------+-----------+-------------+
  ... | Int64.int | Real64.real | Int64.int | (pointer *) | ...
      +-----------+-------------+-----------+----------|--+
      ^                                                |
      |                                                v
    element at index i starts at offset 32*i      heap-allocated string

(Note that the string is still heap-allocated. Flattening only flattens
outer-most tuples. Also, due to MLton's GC model, all pointer data is
moved to the end.)

The performance advantages are significant. See MaPLe pr #223
(#223) for more info.
The example `nn-flat` is nearly identical to `nn`, except that the
input sequence of 2D points is now represented as `MPL.ArrayFlat.array`
instead of a regular `array`. The diff between the two benchmarks is
very small, requiring modifications to only a few lines to pick the
new array type at the source level.

The performance results speak for themselves:

  $ cd examples
  $ make nn nn-flat
  $ bin/nn @mpl procs 8 -- -N 10000000
  N 10000000
  generated input in 0.1316s
  built quadtree in 0.8173s
  found all neighbors in 1.4025s
  ...
  $ bin/nn-flat @mpl procs 8 -- -N 10000000
  N 10000000
  generated input in 0.0277s       # 5x improvement (!)
  built quadtree in 0.5001s        # 60% improvement
  found all neighbors in 1.0248s   # 40% improvement
@shwestrick
Copy link
Collaborator Author

These newest commits take a slightly different approach: rather than flattening all sequences (which is unsound in general), we instead introduce new types for flattened arrays and vectors. Notably, the new flattened types do not support CAS.

@shwestrick
Copy link
Collaborator Author

Inside the compiler proper, the new approach is to parameterize the primitive array and vector types by a "layout", which currently is only Default or Flattened. In the FlattenIntoSequences compiler pass, flattening is performed only for arrays/vectors that are marked with the Flattened layout. Arrays of Default layout still undergo existing flattening optimizations (e.g. DeepFlatten).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants