Skip to content

Conversation

@berarma
Copy link
Contributor

@berarma berarma commented Mar 10, 2025

I think that using the tightness parameter of the SPCAP model as the panning strength was a mistake. Panning strength is an effect that should be applied at the end of the processing to change the final mix. I've done that and I've exposed the tightness parameter in the inspector for more flexibility.

Also, the SPCAP model seems to be developed for 5.1 setups and up. But the paper says that the resulting output is good to be downmixed for setups with less speakers. I've used the 5.1 speaker setup internally, and downmixed for the stereo and 3.1 setups.

With these changes, the audio positioning should be better in stereo and 3.1 setups, also the headphone output, and the panning strength should work more as intended. Tested only with stereo speakers and headphones. Needs more testing with more setups.

With panning strength at 1.0, no speaker will be muted, but I think it wouldn't make sense to forcibly mute any speakers for this case. We achieve maximum stereo separation though, as output by SPCAP. For further separation of channels, tightness can be increased.

This PR will work better together with #103856.

@berarma
Copy link
Contributor Author

berarma commented Mar 10, 2025

The "enhancement" label is debatable. I've used the term "improve" because I don't want to claim something is fixed without more testing.

@berarma berarma changed the title AudioStreamPlayer3D: Improve spatialiation in stereo AudioStreamPlayer3D: Improve spatialization in stereo Mar 11, 2025
@goatchurchprime
Copy link
Contributor

Not convinced by this method. That quote in the paper about downmixing 5 channels to stereo is just a throw-away comment that is unsubstantiated.

Given that 99.9% of the use is Stereo, we should treat it as a special case.

I've conducted an exhaustive analysis of the default spatial stereo implementation of three major platforms (UE, Unity and WebAudio) at #103989 and can reduce it to three lines of code with no trig functions.

If you think it makes better sense, I'll let you implement it. Then I'll run my simulation against it to check it is working consistently.

@berarma
Copy link
Contributor Author

berarma commented Mar 11, 2025

Not convinced by this method. That quote in the paper about downmixing 5 channels to stereo is just a throw-away comment that is unsubstantiated.

In my tests, it works well solving the issues without changing too much, the implementation is homogeneous through all speaker setups,, and it also provides additional flexibility through the panning strength and tightness parameter.

This is the quote from the paper that I agree with:

«First, this method provides a framework in which content distribution is abstracted from delivery mechanism. That is, pan values in SPCAP are extracted from 3D virtual source positions, which do not depend on the configuration of the playback system. Therefore, sessions created using one setup map nicely to another – possibly very different – setup. Content mixed on an ITU setup, for example, will mix-down to 4.1 or two-channel system in a reasonable way.»

Given that 99.9% of the use is Stereo, we should treat it as a special case.

I've conducted an exhaustive analysis of the default spatial stereo implementation of three major platforms (UE, Unity and WebAudio) at #103989 and can reduce it to three lines of code with no trig functions.

If you think it makes better sense, I'll let you implement it. Then I'll run my simulation against it to check it is working consistently.

It would make sense to me if the results are consistently better. With only two speakers, the room for improvement is very limited without implementing a more sophisticated model. We can try it, I don't mind if you do your own PR or change mine, or if you want me to do it. I'll let you decide.

@goatchurchprime
Copy link
Contributor

  1. Can you generate and screenshot an amplitude panning graph using this project for your implementation?
    https://github.com/user-attachments/files/19189811/amplitudepanning.zip

  2. Is this a good enough reason to implement something different to what I have identified as an industry standard?

@berarma
Copy link
Contributor Author

berarma commented Mar 11, 2025

This is the graph for this PR:

Captura desde 2025-03-12 00-09-32

I've seen your proposal at #103989 and I like the simplicity but I don't think it's better because there's no difference when the sound source is in front or behind the listener.

In this PR, the decay in volume when the sound source is passing behind the listener gives a better impression that it's moving around the listener, and not bouncing from side to side.

I'll try to incorporate that other method into this PR so we can test them together.

@goatchurchprime
Copy link
Contributor

Just so we are talking about the same thing with regards to how good 3D Spatial Audio can be when properly implemented, everyone should run through the demo recordings on the following page with headphones on https://designingsound.org/2018/03/29/lets-test-3d-audio-spatialization-plugins/#2272 It's quite beautiful.

When you turn the spatial audio plugin on Unity, the volume in each ear stays constant (I graphed it), yet it really sounds like the object is fully orbiting your head. This is because the HRTF changes the phase and amplitude of the different frequencies. (There is some example code here.)

It should be obvious that wearing headphones and using external speakers are COMPLETELY different listening conditions, due to the fact that both ears hear the sounds from both speakers when they are exposed, but can only hear one channel each when wearing headphones.

The fact that the left speaker is silent when the sound source is directly on the right is optimal with exposed speakers because your left ear hears some sound anyway. However, the total silence you get in one ear with headphones on can be uncomfortable.

That's why the documentation recommends setting the panning_strength to 0.5 for headphones.

The documentation also says: "A value of 1.0 completely mutes one of the channels if the sound is located exactly to the left (or right) of the listener."

However, according to that graph, your implementation reduces the channel to 50% when the sound is located exactly to the left (or right) of the listener. What had you set the panning_strength to? (I note that the current code reverts the default setting of 0.5 by internally doubling the global_panning_strength, when there is no reason to treat the local and global values differently.)

Given the incredible breadth and depth of this field, I am strongly of the view that the core implementation is not the place to get creative with sound engineering. Sticking as close as possible to what can be identified as an industry standard implementation for the base-line means it will be consistent, familiar and predictable to sound engineers who have to work across different platforms. They know what to do with it.

Making undocumented "improvements" such as changing the volume between front and back or according to elevation is not going to help them.

At the very least these "improvements" have to be optional, and there must be some setting in the Godot engine that enables AudioStreamPlayer3D to do exactly the same thing as it does across the whole of the rest of the industry.

As I have noticed in the way that other engines work, it is standard practice to reserve the super fancy audio processing to a plugin. In our case it would be a GDExtension that adds a class that is able to replace AudioStreamPlayer3D. Not only does this prevent bloat in the main application (these plugins can involve megabytes of lookup tables), but it also frees the developers to get on with innovating inside the the plugin without requiring any action from the Audio Team.

@berarma
Copy link
Contributor Author

berarma commented Mar 12, 2025

Just so we are talking about the same thing with regards to how good 3D Spatial Audio can be when properly implemented, everyone should run through the demo recordings on the following page with headphones on https://designingsound.org/2018/03/29/lets-test-3d-audio-spatialization-plugins/#2272 It's quite beautiful.

When you turn the spatial audio plugin on Unity, the volume in each ear stays constant (I graphed it), yet it really sounds like the object is fully orbiting your head. This is because the HRTF changes the phase and amplitude of the different frequencies. (There is some example code here.)

Godot doesn't use that model.

It should be obvious that wearing headphones and using external speakers are COMPLETELY different listening conditions, due to the fact that both ears hear the sounds from both speakers when they are exposed, but can only hear one channel each when wearing headphones.

The fact that the left speaker is silent when the sound source is directly on the right is optimal with exposed speakers because your left ear hears some sound anyway. However, the total silence you get in one ear with headphones on can be uncomfortable.

That's why the documentation recommends setting the panning_strength to 0.5 for headphones.

The documentation also says: "A value of 1.0 completely mutes one of the channels if the sound is located exactly to the left (or right) of the listener."

However, according to that graph, your implementation reduces the channel to 50% when the sound is located exactly to the left (or right) of the listener. What had you set the panning_strength to? (I note that the current code reverts the default setting of 0.5 by internally doubling the global_panning_strength, when there is no reason to treat the local and global values differently.)

The SPCAP model used here doesn't allow for that and it's quite logical. In the original code, panning_strength is used to change tightness but it didn't work like the documentation states either. The panning_strength only works as documented for AudioStreamPlayer2D. In this PR is used to mix left and right to simulate both ears hearing both left and right speakers. I think it's coherent with the documentation although it should be updated.

Given the incredible breadth and depth of this field, I am strongly of the view that the core implementation is not the place to get creative with sound engineering. Sticking as close as possible to what can be identified as an industry standard implementation for the base-line means it will be consistent, familiar and predictable to sound engineers who have to work across different platforms. They know what to do with it.

Making undocumented "improvements" such as changing the volume between front and back or according to elevation is not going to help them.

At the very least these "improvements" have to be optional, and there must be some setting in the Godot engine that enables AudioStreamPlayer3D to do exactly the same thing as it does across the whole of the rest of the industry.

I don't think I'm getting creative. Godot uses SPCAP for 3D spatial audio, and I've just adapted it in a logical way for use with stereo and 3.1 setups. Could you explain what do you think isn't reasonable or logical?

I don't have the intention to discuss if using SPCAP is a good idea, that should be a proposal for the team, I would just like that the current problems are solved in the best way possible. Whatever solution is used has to be implemented, there's no standard system API like Vulkan or OpenGL to use. I think the current solution is good enough, somewhere between the too basic PannerNode "equalpower" Panning and the more complex HRTF,

As I have noticed in the way that other engines work, it is standard practice to reserve the super fancy audio processing to a plugin. In our case it would be a GDExtension that adds a class that is able to replace AudioStreamPlayer3D. Not only does this prevent bloat in the main application (these plugins can involve megabytes of lookup tables), but it also frees the developers to get on with innovating inside the the plugin without requiring any action from the Audio Team.

This isn't super fancy. Personally, I wouldn't like Godot to be an almost empty framework skeleton depending on extensions to do even the most basic things a game needs.

If you mean that I should do my fancy experiments in a GDExtension, I disagree that these are fancy experiments. But then I think you should propose a different solution, because it's evident that the current implementation is broken for stereo and 3.1.

@goatchurchprime
Copy link
Contributor

  1. Can we agree that it is desirable for there to be at least one option in the settings for the AudioStreamPlayer3D to reproduce the same exact output as the default UE, Unity and WebAudio systems that most audio engineers will be familiar with? Namely, speaker_volume = cos(horizontal_angle/2).

  2. Can you rewrite the panning strength description so that it accurately describes the panning strength field that you have proposed in this PR? In particular, if the value cannot be adjusted, how have you chosen to set the ratio between the volume of a sound in front of you versus the same sound the same distance behind you?

@berarma
Copy link
Contributor Author

berarma commented Mar 12, 2025

  1. Can we agree that it is desirable for there to be at least one option in the settings for the AudioStreamPlayer3D to reproduce the same exact output as the default UE, Unity and WebAudio systems that most audio engineers will be familiar with? Namely, speaker_volume = cos(horizontal_angle/2).

Yes, no problem.

  1. Can you rewrite the panning strength description so that it accurately describes the panning strength field that you have proposed in this PR? In particular, if the value cannot be adjusted, how have you chosen to set the ratio between the volume of a sound in front of you versus the same sound the same distance behind you?

I'll do.

I've chosen -3dB for the center channel and -6dB for the rear channels. Standards like ATCS and Dolby use -3dB for both, but it seems audio engineers lean towards -3dB and -6dB. I don't prefer one over the other.

Anyway, I had already thought about having an option to select the attenuation of the rear speakers. Setting rear speakers attenuation to zero might produce the same output as the method you propose. It would be perfect to have both options at the reach of a slider.

Exposes the tightness parameter from SPCAP and also applies panning strength as an effect.
@surreal6
Copy link

Hi, I found out about this discussion thanks to @goatchurchprime's notice.

@berarma, firts of all, thanks for your work on this issue. I want to comment something about your proposed solution.

I'm in a team working in an XR game about pursuing invisible sound objects in roomscale, and we detect strange behavior in the sounds when they are behind us. We didn't recognize it as a bug, but now thanks to julian's polar representation we can recognize the problem with the spatial node.

I understand @goatchurchprime point about 'getting creative' although I wouldn't say it that way. I believe that stick to the standar representation he can track in UE and unity is a must.

I do believe your transformation to balance left and right to avoid channel mutting is ok and welcome, and also the one for rear attenuation, but please make it optional, with an slider as you suggest.

Rear occlussion is a complex subject and, in example, in our game, the sound object that kills you if you touch it, is a low bass sound, and attenuation when is at your back has to be managed with caution, as low frecuencies doesn't really attenuate that much according to their position. And there is already a difference between front and back without attenuation, when you turn your head, objects at your back moves in the opposite direction. It's very subtle, but our brain is pretty good at discriminating. Unless this 45-degree mute happens...

That's why what worries me more is the 45 angle muting instead of pure left and right. In the polar diagram you showed in previous comments it still shows the 45 angle mutting.

Are you planning to fix this deviation?
is it possible?

@berarma
Copy link
Contributor Author

berarma commented Mar 17, 2025

Hi, I found out about this discussion thanks to @goatchurchprime's notice.

@berarma, firts of all, thanks for your work on this issue. I want to comment something about your proposed solution.

I'm in a team working in an XR game about pursuing invisible sound objects in roomscale, and we detect strange behavior in the sounds when they are behind us. We didn't recognize it as a bug, but now thanks to julian's polar representation we can recognize the problem with the spatial node.

I understand @goatchurchprime point about 'getting creative' although I wouldn't say it that way. I believe that stick to the standar representation he can track in UE and unity is a must.

I do believe your transformation to balance left and right to avoid channel mutting is ok and welcome, and also the one for rear attenuation, but please make it optional, with an slider as you suggest.

Rear occlussion is a complex subject and, in example, in our game, the sound object that kills you if you touch it, is a low bass sound, and attenuation when is at your back has to be managed with caution, as low frecuencies doesn't really attenuate that much according to their position. And there is already a difference between front and back without attenuation, when you turn your head, objects at your back moves in the opposite direction. It's very subtle, but our brain is pretty good at discriminating. Unless this 45-degree mute happens...

That's why what worries me more is the 45 angle muting instead of pure left and right. In the polar diagram you showed in previous comments it still shows the 45 angle mutting.

Are you planning to fix this deviation? is it possible?

Hi.

I'm implementing and testing both solutions (I think there are still some bugs) and I'll update as soon as they work well. Sorry for the wait, I can't invest much time into this for now.

I realize that the VR with headphones case is where equalpower panning can work well, only significantly improved by HRTF. But I think the results might not be as radically different with this approach. The graph here might misrepresent how it sounds. The muting is very mild and actually unnoticeable with panning_strength set to 0.5.

Besides not being worse for VR, I think it improves spatial audio for the non-VR headphones and stereo speakers cases.

I'd like to solve the issue in the best possible general way, without bias towards my solution. I have little experience in audio in the context of videogames except as a player of FPS games, so I'll take whatever the consensus is. If someone experienced already knows that I'm wrong, better tell me sooner than later. Likewise, if the Godot maintainers share @goatchurchprime view on this, please, let me know too.

@berarma berarma force-pushed the audio_stream_player_3d_stereo branch from 9ba2298 to 3842447 Compare March 28, 2025 22:29
@berarma
Copy link
Contributor Author

berarma commented Mar 28, 2025

I've just read the last meeting notes and it seems this PR isn't being considered so I won't be doing any more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AudioStreamPlayer3D spatial panning problem

5 participants