Skip to content

Commit f69d518

Browse files
fitzgenbongjunj
authored andcommitted
Stratification of call graphs for parallel bottom-up inlining (bytecodealliance#11269)
* Stratification of call graphs for parallel bottom-up inlining This commit takes a call graph and constructs a strata, which is essentially a parallel execution plan. A strata consists of an ordered sequence of layers, and a layer of an unordered set of functions. The `i`th layer must be processed before the `i + 1`th layer, but functions within the same layer may be processed in any order (and in parallel). For example, when given the following tree-like call graph: +---+ +---+ +---+ | a |-->| b |-->| c | +---+ +---+ +---+ | | | | +---+ | '---->| d | | +---+ | | +---+ +---+ '---->| e |-->| f | +---+ +---+ | | +---+ '---->| g | +---+ then stratification will produce these layers: [ {c, d, f, g}, {b, e}, {a}, ] Our goal in constructing the layers is to maximize potential parallelism at each layer. Logically, we do this by finding the strongly-connected components of the input call graph and peeling off all of the leaves of SCCs' condensation (i.e. the DAG that the SCCs form; see the documentation for the `StronglyConnectedComponents::evaporation` method for details). These leaves become the strata's first layer. The layer's components are removed from the condensation graph, and we repeat the process, so that the condensation's new leaves become the strata's second layer, and etc... until the condensation graph is empty and all components have been processed. In practice we don't actually mutate the condensation graph or remove its nodes but instead count how many unprocessed dependencies each component has, and a component is ready for inclusion in a layer once its unprocessed-dependencies count reaches zero. This commit also renames the entity type for strongly-connected components from `Component` to `Scc`, as I felt the former was a bit ambiguous given Wasm components. The next PR will extend Wasmtime's compilation driver code to actually make use of this new infrastructure. * Address review feedback
1 parent 8dc83a8 commit f69d518

File tree

5 files changed

+783
-27
lines changed

5 files changed

+783
-27
lines changed

cranelift/entity/src/map.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,23 @@ where
149149
}
150150
}
151151

152+
impl<K, V> FromIterator<(K, V)> for SecondaryMap<K, V>
153+
where
154+
K: EntityRef,
155+
V: Clone + Default,
156+
{
157+
fn from_iter<T: IntoIterator<Item = (K, V)>>(iter: T) -> Self {
158+
let iter = iter.into_iter();
159+
let (min, max) = iter.size_hint();
160+
let cap = max.unwrap_or_else(|| 2 * min);
161+
let mut map = Self::with_capacity(cap);
162+
for (k, v) in iter {
163+
map[k] = v;
164+
}
165+
map
166+
}
167+
}
168+
152169
/// Immutable indexing into an `SecondaryMap`.
153170
///
154171
/// All keys are permitted. Untouched entries have the default value.

crates/wasmtime/src/compile.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ use std::{
3131
borrow::Cow,
3232
collections::{BTreeMap, BTreeSet, btree_map},
3333
mem,
34+
ops::Range,
3435
};
3536

3637
use wasmtime_environ::CompiledFunctionBody;
@@ -43,7 +44,9 @@ use wasmtime_environ::{
4344
StaticModuleIndex,
4445
};
4546

47+
mod call_graph;
4648
mod scc;
49+
mod stratify;
4750

4851
mod code_builder;
4952
pub use self::code_builder::{CodeBuilder, CodeHint, HashedEngineCompileEnv};
@@ -1017,3 +1020,17 @@ impl Artifacts {
10171020
self.modules.into_iter().next().unwrap().1
10181021
}
10191022
}
1023+
1024+
/// Extend `dest` with `items` and return the range of indices in `dest` where
1025+
/// they ended up.
1026+
fn extend_with_range<T>(dest: &mut Vec<T>, items: impl IntoIterator<Item = T>) -> Range<u32> {
1027+
let start = dest.len();
1028+
let start = u32::try_from(start).unwrap();
1029+
1030+
dest.extend(items);
1031+
1032+
let end = dest.len();
1033+
let end = u32::try_from(end).unwrap();
1034+
1035+
start..end
1036+
}
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
//! Construction of the call-graph, for the purposes of inlining.
2+
//!
3+
//! These call graphs are not necessarily complete or accurate, and Wasmtime's
4+
//! soundness does not rely on those properties. First off, we do not attempt to
5+
//! understand indirect calls, which at their worst must force any call analysis
6+
//! give up and say "the callee could be absolutely any function". More
7+
//! interestingly, these call graphs are only used for scheduling bottom-up
8+
//! inlining, so the worst that inaccurate information can do is cause us to
9+
//! miss inlining opportunities or lose potential parallelism in our
10+
//! schedule. For best results, however, every direct call that is potentially
11+
//! inlinable should be reported when constructing these call graphs.
12+
13+
#![cfg_attr(not(test), expect(dead_code, reason = "used in upcoming PRs"))]
14+
15+
use super::*;
16+
use core::{
17+
fmt::{self, Debug},
18+
ops::Range,
19+
};
20+
use wasmtime_environ::{EntityRef, SecondaryMap};
21+
22+
/// A call graph reified into a densely packed and quickly accessible
23+
/// representation.
24+
///
25+
/// In a call graph, nodes are functions, and an edge `f --> g` means that the
26+
/// function `f` calls the function `g`.
27+
pub struct CallGraph<Node>
28+
where
29+
Node: EntityRef + Debug,
30+
{
31+
/// A map from each node to the subslice of `self.edge_elems` that are its
32+
/// edges.
33+
edges: SecondaryMap<Node, Range<u32>>,
34+
35+
/// Densely packed edge elements for `self.edges`.
36+
edge_elems: Vec<Node>,
37+
}
38+
39+
impl<Node> Debug for CallGraph<Node>
40+
where
41+
Node: EntityRef + Debug,
42+
{
43+
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
44+
struct Edges<'a, Node: EntityRef + Debug>(&'a CallGraph<Node>);
45+
46+
impl<'a, Node: EntityRef + Debug> Debug for Edges<'a, Node> {
47+
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
48+
f.debug_map()
49+
.entries(self.0.nodes().map(|n| (n, self.0.edges(n))))
50+
.finish()
51+
}
52+
}
53+
54+
f.debug_struct("CallGraph")
55+
.field("edges", &Edges(self))
56+
.finish()
57+
}
58+
}
59+
60+
impl<Node> CallGraph<Node>
61+
where
62+
Node: EntityRef + Debug,
63+
{
64+
/// Construct a new call graph.
65+
///
66+
/// `funcs` should be an iterator over all function nodes in this call
67+
/// graph's translation unit.
68+
///
69+
/// The `get_calls` function should yield (by pushing onto the given `Vec`)
70+
/// all of the callee function nodes that the given caller function node
71+
/// calls.
72+
pub fn new(
73+
funcs: impl IntoIterator<Item = Node>,
74+
get_calls: impl Fn(Node, &mut Vec<Node>) -> Result<()>,
75+
) -> Result<Self> {
76+
let funcs = funcs.into_iter();
77+
78+
let (min, max) = funcs.size_hint();
79+
let capacity = max.unwrap_or_else(|| 2 * min);
80+
let mut edges = SecondaryMap::with_capacity(capacity);
81+
let mut edge_elems = vec![];
82+
83+
let mut calls = vec![];
84+
for caller in funcs {
85+
debug_assert!(calls.is_empty());
86+
get_calls(caller, &mut calls)?;
87+
88+
debug_assert_eq!(edges[caller], Range::default());
89+
edges[caller] = extend_with_range(&mut edge_elems, calls.drain(..));
90+
}
91+
92+
Ok(CallGraph { edges, edge_elems })
93+
}
94+
95+
/// Get the function nodes in this call graph.
96+
pub fn nodes(&self) -> impl ExactSizeIterator<Item = Node> {
97+
self.edges.keys()
98+
}
99+
100+
/// Get the callee function nodes that the given caller function node calls.
101+
pub fn edges(&self, node: Node) -> &[Node] {
102+
let Range { start, end } = self.edges[node].clone();
103+
let start = usize::try_from(start).unwrap();
104+
let end = usize::try_from(end).unwrap();
105+
&self.edge_elems[start..end]
106+
}
107+
}

0 commit comments

Comments
 (0)