Skip to content

Commit 3eac466

Browse files
committed
fill out more details
1 parent 19857bd commit 3eac466

File tree

1 file changed

+66
-7
lines changed

1 file changed

+66
-7
lines changed

doc/developer/life-of-a-query.md

Lines changed: 66 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -210,20 +210,79 @@ explain those.
210210

211211
## Compute & Storage Controllers
212212

213-
Should maybe explain the compute and storage protocol, to really describe how
214-
the commands flow to the cluster and how the responses come back.
213+
The adapter interacts with clusters and storage collections through two main
214+
controllers: the
215+
[ComputeController](https://github.com/MaterializeInc/materialize/blob/main/src/compute-client/src/controller.rs)
216+
and the
217+
[StorageController](https://github.com/MaterializeInc/materialize/blob/main/src/storage-controller/src/lib.rs).
218+
These controllers act as intermediaries that translate adapter commands into
219+
cluster-specific operations and manage the lifecycle of compute and storage
220+
resources.
221+
222+
The
223+
[ComputeController](https://github.com/MaterializeInc/materialize/blob/main/src/compute-client/src/controller.rs)
224+
manages compute instances (clusters) and the dataflows running on them. It
225+
handles the creation and maintenance of indexes, materialized views, and
226+
dataflows, talking to cluster replicas via the compute protocol.
227+
228+
The
229+
[StorageController](https://github.com/MaterializeInc/materialize/blob/main/src/storage-controller/src/lib.rs)
230+
manages storage collections including sources, tables, and sinks. For ingestion
231+
from external systems, it needs to install computation on a cluster. Similarly
232+
to the compute controller, communication with the storage parts on a cluster
233+
replica happens via the storage protocol.
234+
235+
Both controllers maintain read and write capabilities for their respective
236+
resources, coordinate compaction policies, and ensure that data remains
237+
accessible as long as needed while allowing garbage collection when possible.
215238

216239
## Arrangements
217240

218-
TODO: Write up something about arrangements, how it's the basis for sharing and ultimately the thing that can be queries from a cluster.
241+
Arrangements are multiversioned indexes that serve as the foundation for data
242+
sharing and efficient querying in Materialize. As described in the
243+
[arrangements documentation](/doc/developer/arrangements.md), an arrangement is
244+
an indexed representation of a stream of update triples `(data, time, diff)`,
245+
organized by key for efficient lookups.
246+
247+
Arrangements are required by many differential dataflow operators. The `join`
248+
operator needs both of its inputs to be arrangements indexed by the join keys,
249+
while the `reduce` operator requires both input and output arrangements. This
250+
means that a single SQL query can create multiple arrangements as it gets
251+
compiled into a dataflow graph.
252+
253+
The key benefit of arrangements is sharing: multiple operators can reuse the
254+
same arrangement if they need data indexed by the same key. This sharing is
255+
especially common with indexes, materialized sources, and materialized views,
256+
which publish their arrangements for reuse across dataflows.
257+
258+
Arrangements only store distinct `(key, value, time)` combinations and undergo
259+
both logical compaction (forgetting historical detail that no reader needs) and
260+
physical compaction (consolidating space). This makes their memory usage
261+
proportional to the current accumulated state rather than the total volume of
262+
updates processed.
219263

220264
## Storage
221265

222-
TODO: Both storage and persist are mentioned above, so we should at least give
223-
an overview.
266+
TODO: Buff out this section.
224267

225268
## Persist
226269

227-
TODO: Both storage and persist are mentioned above, so we should at least give
228-
an overview.
270+
Persist is Materialize's durable storage implementation that provides definite
271+
Time-Varying Collections as described in the [persist design
272+
document](/doc/developer/design/20220330_persist.md). It serves as the
273+
foundation for the storage layer.
274+
275+
The core abstraction is a "shard" - a durable
276+
[TVC](/doc/developer/platform/formalism.md#in-a-nutshell) that can be written
277+
to and read from concurrently. Persist uses a rich client model where readers
278+
and writers interact directly with the underlying blob storage (typically S3)
279+
while coordinating through a consensus system for metadata operations.
280+
281+
Persist is built on two key primitives: `Blob` (a durable key-value store) and
282+
`Consensus` (a linearizable log). The blob storage holds the actual data in
283+
immutable batches, while consensus maintains a state machine that tracks
284+
metadata like shard frontiers, active readers/writers, and batch locations.
285+
286+
Key features include automatic compaction to bound storage costs and horizontal
287+
read scalability.
229288

0 commit comments

Comments
 (0)