@@ -210,20 +210,79 @@ explain those.
210
210
211
211
## Compute & Storage Controllers
212
212
213
- Should maybe explain the compute and storage protocol, to really describe how
214
- the commands flow to the cluster and how the responses come back.
213
+ The adapter interacts with clusters and storage collections through two main
214
+ controllers: the
215
+ [ ComputeController] ( https://github.com/MaterializeInc/materialize/blob/main/src/compute-client/src/controller.rs )
216
+ and the
217
+ [ StorageController] ( https://github.com/MaterializeInc/materialize/blob/main/src/storage-controller/src/lib.rs ) .
218
+ These controllers act as intermediaries that translate adapter commands into
219
+ cluster-specific operations and manage the lifecycle of compute and storage
220
+ resources.
221
+
222
+ The
223
+ [ ComputeController] ( https://github.com/MaterializeInc/materialize/blob/main/src/compute-client/src/controller.rs )
224
+ manages compute instances (clusters) and the dataflows running on them. It
225
+ handles the creation and maintenance of indexes, materialized views, and
226
+ dataflows, talking to cluster replicas via the compute protocol.
227
+
228
+ The
229
+ [ StorageController] ( https://github.com/MaterializeInc/materialize/blob/main/src/storage-controller/src/lib.rs )
230
+ manages storage collections including sources, tables, and sinks. For ingestion
231
+ from external systems, it needs to install computation on a cluster. Similarly
232
+ to the compute controller, communication with the storage parts on a cluster
233
+ replica happens via the storage protocol.
234
+
235
+ Both controllers maintain read and write capabilities for their respective
236
+ resources, coordinate compaction policies, and ensure that data remains
237
+ accessible as long as needed while allowing garbage collection when possible.
215
238
216
239
## Arrangements
217
240
218
- TODO: Write up something about arrangements, how it's the basis for sharing and ultimately the thing that can be queries from a cluster.
241
+ Arrangements are multiversioned indexes that serve as the foundation for data
242
+ sharing and efficient querying in Materialize. As described in the
243
+ [ arrangements documentation] ( /doc/developer/arrangements.md ) , an arrangement is
244
+ an indexed representation of a stream of update triples ` (data, time, diff) ` ,
245
+ organized by key for efficient lookups.
246
+
247
+ Arrangements are required by many differential dataflow operators. The ` join `
248
+ operator needs both of its inputs to be arrangements indexed by the join keys,
249
+ while the ` reduce ` operator requires both input and output arrangements. This
250
+ means that a single SQL query can create multiple arrangements as it gets
251
+ compiled into a dataflow graph.
252
+
253
+ The key benefit of arrangements is sharing: multiple operators can reuse the
254
+ same arrangement if they need data indexed by the same key. This sharing is
255
+ especially common with indexes, materialized sources, and materialized views,
256
+ which publish their arrangements for reuse across dataflows.
257
+
258
+ Arrangements only store distinct ` (key, value, time) ` combinations and undergo
259
+ both logical compaction (forgetting historical detail that no reader needs) and
260
+ physical compaction (consolidating space). This makes their memory usage
261
+ proportional to the current accumulated state rather than the total volume of
262
+ updates processed.
219
263
220
264
## Storage
221
265
222
- TODO: Both storage and persist are mentioned above, so we should at least give
223
- an overview.
266
+ TODO: Buff out this section.
224
267
225
268
## Persist
226
269
227
- TODO: Both storage and persist are mentioned above, so we should at least give
228
- an overview.
270
+ Persist is Materialize's durable storage implementation that provides definite
271
+ Time-Varying Collections as described in the [ persist design
272
+ document] ( /doc/developer/design/20220330_persist.md ) . It serves as the
273
+ foundation for the storage layer.
274
+
275
+ The core abstraction is a "shard" - a durable
276
+ [ TVC] ( /doc/developer/platform/formalism.md#in-a-nutshell ) that can be written
277
+ to and read from concurrently. Persist uses a rich client model where readers
278
+ and writers interact directly with the underlying blob storage (typically S3)
279
+ while coordinating through a consensus system for metadata operations.
280
+
281
+ Persist is built on two key primitives: ` Blob ` (a durable key-value store) and
282
+ ` Consensus ` (a linearizable log). The blob storage holds the actual data in
283
+ immutable batches, while consensus maintains a state machine that tracks
284
+ metadata like shard frontiers, active readers/writers, and batch locations.
285
+
286
+ Key features include automatic compaction to bound storage costs and horizontal
287
+ read scalability.
229
288
0 commit comments