Skip to content

Commit f78366c

Browse files
authored
[AUDIO_WORKLET] Optimised output buffer copy (#24891)
A reworking of #22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description: Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple `set()` per channel (per output). The existing interactive tests (written for the original PR) can be run for comparison: ``` test/runner interactive.test_audio_worklet_stereo_io test/runner interactive.test_audio_worklet_2x_stereo_io test/runner interactive.test_audio_worklet_mono_io test/runner interactive.test_audio_worklet_2x_hard_pan_io test/runner interactive.test_audio_worklet_params_mixing test/runner interactive.test_audio_worklet_memory_growth test/runner interactive.test_audio_worklet_hard_pans ``` These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying). The original benchmark of the extracted copy is still valid: https://wip.numfum.com/cw/2024-10-29/index.html This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues). Some explanations: - Fixed-position output buffer views are created once in the`WasmAudioWorkletProcessor` constructor - Stack allocations for the `process()` call are split into aligned struct data (see the comments) and audio/param data - The struct writes are simplified by this splitting of data - `ASSERTIONS` are used to ensure everything fits and correctly aligns - The tests account for size changes in the params, which can vary from a single float to 128 floats (a single float nicely showing up any 8-byte alignment issues for wasm64) ~~Future improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a single `set()` being enough for all outputs.~~
1 parent 2b0e2b3 commit f78366c

File tree

4 files changed

+282
-188
lines changed

4 files changed

+282
-188
lines changed

src/audio_worklet.js

Lines changed: 136 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -32,51 +32,120 @@ function createWasmAudioWorkletProcessor(audioParams) {
3232
this.callback = {{{ makeDynCall('iipipipp', 'opts.callback') }}};
3333
this.userData = opts.userData;
3434
// Then the samples per channel to process, fixed for the lifetime of the
35-
// context that created this processor. Note for when moving to Web Audio
36-
// 1.1: the typed array passed to process() should be the same size as this
37-
// 'render quantum size', and this exercise of passing in the value
38-
// shouldn't be required (to be verified)
35+
// context that created this processor. Even though this 'render quantum
36+
// size' is fixed at 128 samples in the 1.0 spec, it will be variable in
37+
// the 1.1 spec. It's passed in now, just to prove it's settable, but will
38+
// eventually be a property of the AudioWorkletGlobalScope (globalThis).
3939
this.samplesPerChannel = opts.samplesPerChannel;
40+
this.bytesPerChannel = this.samplesPerChannel * {{{ getNativeTypeSize('float') }}};
41+
42+
// Prepare the output views; see createOutputViews(). The 'minimum alloc'
43+
// firstly stops STACK_OVERFLOW_CHECK failing (since the stack will be
44+
// full if we allocate all the available space, with 16 bytes being the
45+
// minimum allo size due to alignments) leaving room for a single
46+
// AudioSampleFrame as a minumum. There's an arbitrary maximum of 64, for
47+
// the case where a multi-MB stack is passed.
48+
this.outputViews = new Array(Math.min(((wwParams.stackSize - /*minimum alloc*/ 16) / this.bytesPerChannel) | 0, /*sensible limit*/ 64));
49+
#if ASSERTIONS
50+
console.assert(this.outputViews.length > 0, `AudioWorklet needs more stack allocating (at least ${this.bytesPerChannel})`);
51+
#endif
52+
this.createOutputViews();
53+
54+
#if ASSERTIONS
55+
// Explicitly verify this later in process(). Note to self, stackSave is a
56+
// bit of a misnomer as it simply gets the stack address.
57+
this.ctorOldStackPtr = stackSave();
58+
#endif
59+
}
60+
61+
/**
62+
* Create up-front as many typed views for marshalling the output data as
63+
* may be required, allocated at the *top* of the worklet's stack (and whose
64+
* addresses are fixed).
65+
*/
66+
createOutputViews() {
67+
// These are still alloc'd to take advantage of the overflow checks, etc.
68+
var oldStackPtr = stackSave();
69+
var viewDataIdx = {{{ getHeapOffset('stackAlloc(this.outputViews.length * this.bytesPerChannel)', 'float') }}};
70+
#if WEBAUDIO_DEBUG
71+
console.log(`AudioWorklet creating ${this.outputViews.length} buffer one-time views (for a stack size of ${wwParams.stackSize} at address ${ptrToString(viewDataIdx * 4)})`);
72+
#endif
73+
// Inserted in reverse so the lowest indices are closest to the stack top
74+
for (var n = this.outputViews.length - 1; n >= 0; n--) {
75+
this.outputViews[n] = HEAPF32.subarray(viewDataIdx, viewDataIdx += this.samplesPerChannel);
76+
}
77+
stackRestore(oldStackPtr);
4078
}
4179

4280
static get parameterDescriptors() {
4381
return audioParams;
4482
}
4583

4684
/**
85+
* Marshals all inputs and parameters to the Wasm memory on the thread's
86+
* stack, then performs the wasm audio worklet call, and finally marshals
87+
* audio output data back.
88+
*
4789
* @param {Object} parameters
4890
*/
4991
process(inputList, outputList, parameters) {
50-
// Marshal all inputs and parameters to the Wasm memory on the thread stack,
51-
// then perform the wasm audio worklet call,
52-
// and finally marshal audio output data back.
92+
#if ALLOW_MEMORY_GROWTH
93+
// Recreate the output views if the heap has changed
94+
// TODO: add support for GROWABLE_ARRAYBUFFERS
95+
if (HEAPF32.buffer != this.outputViews[0].buffer) {
96+
this.createOutputViews();
97+
}
98+
#endif
5399

54100
var numInputs = inputList.length;
55101
var numOutputs = outputList.length;
56102

57103
var entry; // reused list entry or index
58104
var subentry; // reused channel or other array in each list entry or index
59105

60-
// Calculate how much stack space is needed.
61-
var bytesPerChannel = this.samplesPerChannel * {{{ getNativeTypeSize('float') }}};
62-
var stackMemoryNeeded = (numInputs + numOutputs) * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}};
106+
// Calculate the required stack and output buffer views (stack is further
107+
// split into aligned structs and the raw float data).
108+
var stackMemoryStruct = (numInputs + numOutputs) * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}};
109+
var stackMemoryData = 0;
110+
for (entry of inputList) {
111+
stackMemoryData += entry.length;
112+
}
113+
stackMemoryData *= this.bytesPerChannel;
114+
// Collect the total number of output channels (mapped to array views)
115+
var outputViewsNeeded = 0;
116+
for (entry of outputList) {
117+
outputViewsNeeded += entry.length;
118+
}
119+
stackMemoryData += outputViewsNeeded * this.bytesPerChannel;
63120
var numParams = 0;
64-
for (entry of inputList) stackMemoryNeeded += entry.length * bytesPerChannel;
65-
for (entry of outputList) stackMemoryNeeded += entry.length * bytesPerChannel;
66121
for (entry in parameters) {
67-
stackMemoryNeeded += parameters[entry].byteLength + {{{ C_STRUCTS.AudioParamFrame.__size__ }}};
68122
++numParams;
123+
stackMemoryStruct += {{{ C_STRUCTS.AudioParamFrame.__size__ }}};
124+
stackMemoryData += parameters[entry].byteLength;
69125
}
70-
71-
// Allocate the necessary stack space.
72126
var oldStackPtr = stackSave();
73-
var inputsPtr = stackAlloc(stackMemoryNeeded);
127+
#if ASSERTIONS
128+
console.assert(oldStackPtr == this.ctorOldStackPtr, 'AudioWorklet stack address has unexpectedly moved');
129+
console.assert(outputViewsNeeded <= this.outputViews.length, `Too many AudioWorklet outputs (need ${outputViewsNeeded} but have stack space for ${this.outputViews.length})`);
130+
#endif
131+
132+
// Allocate the necessary stack space. All pointer variables are in bytes;
133+
// 'structPtr' starts at the first struct entry (all run sequentially)
134+
// and is the working start to each record; 'dataPtr' is the same for the
135+
// audio/params data, starting after *all* the structs.
136+
// 'structPtr' begins 16-byte aligned, allocated from the internal
137+
// _emscripten_stack_alloc(), as are the output views, and so to ensure
138+
// the views fall on the correct addresses (and we finish at stacktop) we
139+
// request additional bytes, taking this alignment into account, then
140+
// offset `dataPtr` by the difference.
141+
var stackMemoryAligned = (stackMemoryStruct + stackMemoryData + 15) & ~15;
142+
var structPtr = stackAlloc(stackMemoryAligned);
143+
var dataPtr = structPtr + (stackMemoryAligned - stackMemoryData);
74144

75-
// Copy input audio descriptor structs and data to Wasm ('structPtr' is
76-
// reused as the working start to each struct record, 'dataPtr' start of
77-
// the data section, usually after all structs).
78-
var structPtr = inputsPtr;
79-
var dataPtr = inputsPtr + numInputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}};
145+
// Copy input audio descriptor structs and data to Wasm (recall, structs
146+
// first, audio data after). 'inputsPtr' is the start of the C callback's
147+
// input AudioSampleFrame.
148+
var /*const*/ inputsPtr = structPtr;
80149
for (entry of inputList) {
81150
// Write the AudioSampleFrame struct instance
82151
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.numberOfChannels, 'entry.length', 'u32') }}};
@@ -86,28 +155,13 @@ function createWasmAudioWorkletProcessor(audioParams) {
86155
// Marshal the input audio sample data for each audio channel of this input
87156
for (subentry of entry) {
88157
HEAPF32.set(subentry, {{{ getHeapOffset('dataPtr', 'float') }}});
89-
dataPtr += bytesPerChannel;
158+
dataPtr += this.bytesPerChannel;
90159
}
91160
}
92161

93-
// Copy output audio descriptor structs to Wasm
94-
var outputsPtr = dataPtr;
95-
structPtr = outputsPtr;
96-
var outputDataPtr = (dataPtr += numOutputs * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}});
97-
for (entry of outputList) {
98-
// Write the AudioSampleFrame struct instance
99-
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.numberOfChannels, 'entry.length', 'u32') }}};
100-
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.samplesPerChannel, 'this.samplesPerChannel', 'u32') }}};
101-
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.data, 'dataPtr', '*') }}};
102-
structPtr += {{{ C_STRUCTS.AudioSampleFrame.__size__ }}};
103-
// Reserve space for the output data
104-
dataPtr += bytesPerChannel * entry.length;
105-
}
106-
107-
// Copy parameters descriptor structs and data to Wasm
108-
var paramsPtr = dataPtr;
109-
structPtr = paramsPtr;
110-
dataPtr += numParams * {{{ C_STRUCTS.AudioParamFrame.__size__ }}};
162+
// Copy parameters descriptor structs and data to Wasm. 'paramsPtr' is the
163+
// start of the C callback's input AudioParamFrame.
164+
var /*const*/ paramsPtr = structPtr;
111165
for (entry = 0; subentry = parameters[entry++];) {
112166
// Write the AudioParamFrame struct instance
113167
{{{ makeSetValue('structPtr', C_STRUCTS.AudioParamFrame.length, 'subentry.length', 'u32') }}};
@@ -118,20 +172,54 @@ function createWasmAudioWorkletProcessor(audioParams) {
118172
dataPtr += subentry.length * {{{ getNativeTypeSize('float') }}};
119173
}
120174

175+
// Copy output audio descriptor structs to Wasm. 'outputsPtr' is the start
176+
// of the C callback's output AudioSampleFrame. 'dataPtr' will now be
177+
// aligned with the output views, ending at stacktop (which is why this
178+
// needs to be last).
179+
var /*const*/ outputsPtr = structPtr;
180+
for (entry of outputList) {
181+
// Write the AudioSampleFrame struct instance
182+
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.numberOfChannels, 'entry.length', 'u32') }}};
183+
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.samplesPerChannel, 'this.samplesPerChannel', 'u32') }}};
184+
{{{ makeSetValue('structPtr', C_STRUCTS.AudioSampleFrame.data, 'dataPtr', '*') }}};
185+
structPtr += {{{ C_STRUCTS.AudioSampleFrame.__size__ }}};
186+
// Advance the output pointer to the next output (matching the pre-allocated views)
187+
dataPtr += this.bytesPerChannel * entry.length;
188+
}
189+
190+
#if ASSERTIONS
191+
// If all the maths worked out, we arrived at the original stack address
192+
console.assert(dataPtr == oldStackPtr, `AudioWorklet stack missmatch (audio data finishes at ${dataPtr} instead of ${oldStackPtr})`);
193+
194+
// Sanity checks. If these trip the most likely cause, beyond unforeseen
195+
// stack shenanigans, is that the 'render quantum size' changed after
196+
// construction (which shouldn't be possible).
197+
if (numOutputs) {
198+
// First that the output view addresses match the stack positions
199+
dataPtr -= this.bytesPerChannel;
200+
for (entry = 0; entry < outputViewsNeeded; entry++) {
201+
console.assert(dataPtr == this.outputViews[entry].byteOffset, 'AudioWorklet internal error in addresses of the output array views');
202+
dataPtr -= this.bytesPerChannel;
203+
}
204+
// And that the views' size match the passed in output buffers
205+
for (entry of outputList) {
206+
for (subentry of entry) {
207+
console.assert(subentry.byteLength == this.bytesPerChannel, `AudioWorklet unexpected output buffer size (expected ${this.bytesPerChannel} got ${subentry.byteLength})`);
208+
}
209+
}
210+
}
211+
#endif
212+
121213
// Call out to Wasm callback to perform audio processing
122214
var didProduceAudio = this.callback(numInputs, inputsPtr, numOutputs, outputsPtr, numParams, paramsPtr, this.userData);
123215
if (didProduceAudio) {
124216
// Read back the produced audio data to all outputs and their channels.
125-
// (A garbage-free function TypedArray.copy(dstTypedArray, dstOffset,
126-
// srcTypedArray, srcOffset, count) would sure be handy.. but web does
127-
// not have one, so manually copy all bytes in)
128-
outputDataPtr = {{{ getHeapOffset('outputDataPtr', 'float') }}};
217+
// The preallocated 'outputViews' already have the correct offsets and
218+
// sizes into the stack (recall from createOutputViews() that they run
219+
// backwards).
129220
for (entry of outputList) {
130221
for (subentry of entry) {
131-
// repurposing structPtr for now
132-
for (structPtr = 0; structPtr < this.samplesPerChannel; ++structPtr) {
133-
subentry[structPtr] = HEAPF32[outputDataPtr++];
134-
}
222+
subentry.set(this.outputViews[--outputViewsNeeded]);
135223
}
136224
}
137225
}

0 commit comments

Comments
 (0)