-
-
Notifications
You must be signed in to change notification settings - Fork 23.5k
AudioStreamPlayer3D: Improve spatialization in stereo #103926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4d5a8f9 to
9ba2298
Compare
|
The "enhancement" label is debatable. I've used the term "improve" because I don't want to claim something is fixed without more testing. |
|
Not convinced by this method. That quote in the paper about downmixing 5 channels to stereo is just a throw-away comment that is unsubstantiated. Given that 99.9% of the use is Stereo, we should treat it as a special case. I've conducted an exhaustive analysis of the default spatial stereo implementation of three major platforms (UE, Unity and WebAudio) at #103989 and can reduce it to three lines of code with no trig functions. If you think it makes better sense, I'll let you implement it. Then I'll run my simulation against it to check it is working consistently. |
In my tests, it works well solving the issues without changing too much, the implementation is homogeneous through all speaker setups,, and it also provides additional flexibility through the panning strength and tightness parameter. This is the quote from the paper that I agree with: «First, this method provides a framework in which content distribution is abstracted from delivery mechanism. That is, pan values in SPCAP are extracted from 3D virtual source positions, which do not depend on the configuration of the playback system. Therefore, sessions created using one setup map nicely to another – possibly very different – setup. Content mixed on an ITU setup, for example, will mix-down to 4.1 or two-channel system in a reasonable way.»
It would make sense to me if the results are consistently better. With only two speakers, the room for improvement is very limited without implementing a more sophisticated model. We can try it, I don't mind if you do your own PR or change mine, or if you want me to do it. I'll let you decide. |
|
|
This is the graph for this PR: I've seen your proposal at #103989 and I like the simplicity but I don't think it's better because there's no difference when the sound source is in front or behind the listener. In this PR, the decay in volume when the sound source is passing behind the listener gives a better impression that it's moving around the listener, and not bouncing from side to side. I'll try to incorporate that other method into this PR so we can test them together. |
|
Just so we are talking about the same thing with regards to how good 3D Spatial Audio can be when properly implemented, everyone should run through the demo recordings on the following page with headphones on https://designingsound.org/2018/03/29/lets-test-3d-audio-spatialization-plugins/#2272 It's quite beautiful. When you turn the spatial audio plugin on Unity, the volume in each ear stays constant (I graphed it), yet it really sounds like the object is fully orbiting your head. This is because the HRTF changes the phase and amplitude of the different frequencies. (There is some example code here.) It should be obvious that wearing headphones and using external speakers are COMPLETELY different listening conditions, due to the fact that both ears hear the sounds from both speakers when they are exposed, but can only hear one channel each when wearing headphones. The fact that the left speaker is silent when the sound source is directly on the right is optimal with exposed speakers because your left ear hears some sound anyway. However, the total silence you get in one ear with headphones on can be uncomfortable. That's why the documentation recommends setting the The documentation also says: "A value of 1.0 completely mutes one of the channels if the sound is located exactly to the left (or right) of the listener." However, according to that graph, your implementation reduces the channel to 50% when the sound is located exactly to the left (or right) of the listener. What had you set the Given the incredible breadth and depth of this field, I am strongly of the view that the core implementation is not the place to get creative with sound engineering. Sticking as close as possible to what can be identified as an industry standard implementation for the base-line means it will be consistent, familiar and predictable to sound engineers who have to work across different platforms. They know what to do with it. Making undocumented "improvements" such as changing the volume between front and back or according to elevation is not going to help them. At the very least these "improvements" have to be optional, and there must be some setting in the Godot engine that enables As I have noticed in the way that other engines work, it is standard practice to reserve the super fancy audio processing to a plugin. In our case it would be a GDExtension that adds a class that is able to replace |
Godot doesn't use that model.
The SPCAP model used here doesn't allow for that and it's quite logical. In the original code,
I don't think I'm getting creative. Godot uses SPCAP for 3D spatial audio, and I've just adapted it in a logical way for use with stereo and 3.1 setups. Could you explain what do you think isn't reasonable or logical? I don't have the intention to discuss if using SPCAP is a good idea, that should be a proposal for the team, I would just like that the current problems are solved in the best way possible. Whatever solution is used has to be implemented, there's no standard system API like Vulkan or OpenGL to use. I think the current solution is good enough, somewhere between the too basic PannerNode "equalpower" Panning and the more complex HRTF,
This isn't super fancy. Personally, I wouldn't like Godot to be an almost empty framework skeleton depending on extensions to do even the most basic things a game needs. If you mean that I should do my fancy experiments in a GDExtension, I disagree that these are fancy experiments. But then I think you should propose a different solution, because it's evident that the current implementation is broken for stereo and 3.1. |
|
Yes, no problem.
I'll do. I've chosen -3dB for the center channel and -6dB for the rear channels. Standards like ATCS and Dolby use -3dB for both, but it seems audio engineers lean towards -3dB and -6dB. I don't prefer one over the other. Anyway, I had already thought about having an option to select the attenuation of the rear speakers. Setting rear speakers attenuation to zero might produce the same output as the method you propose. It would be perfect to have both options at the reach of a slider. |
Exposes the tightness parameter from SPCAP and also applies panning strength as an effect.
|
Hi, I found out about this discussion thanks to @goatchurchprime's notice. @berarma, firts of all, thanks for your work on this issue. I want to comment something about your proposed solution. I'm in a team working in an XR game about pursuing invisible sound objects in roomscale, and we detect strange behavior in the sounds when they are behind us. We didn't recognize it as a bug, but now thanks to julian's polar representation we can recognize the problem with the spatial node. I understand @goatchurchprime point about 'getting creative' although I wouldn't say it that way. I believe that stick to the standar representation he can track in UE and unity is a must. I do believe your transformation to balance left and right to avoid channel mutting is ok and welcome, and also the one for rear attenuation, but please make it optional, with an slider as you suggest. Rear occlussion is a complex subject and, in example, in our game, the sound object that kills you if you touch it, is a low bass sound, and attenuation when is at your back has to be managed with caution, as low frecuencies doesn't really attenuate that much according to their position. And there is already a difference between front and back without attenuation, when you turn your head, objects at your back moves in the opposite direction. It's very subtle, but our brain is pretty good at discriminating. Unless this 45-degree mute happens... That's why what worries me more is the 45 angle muting instead of pure left and right. In the polar diagram you showed in previous comments it still shows the 45 angle mutting. Are you planning to fix this deviation? |
Hi. I'm implementing and testing both solutions (I think there are still some bugs) and I'll update as soon as they work well. Sorry for the wait, I can't invest much time into this for now. I realize that the VR with headphones case is where equalpower panning can work well, only significantly improved by HRTF. But I think the results might not be as radically different with this approach. The graph here might misrepresent how it sounds. The muting is very mild and actually unnoticeable with Besides not being worse for VR, I think it improves spatial audio for the non-VR headphones and stereo speakers cases. I'd like to solve the issue in the best possible general way, without bias towards my solution. I have little experience in audio in the context of videogames except as a player of FPS games, so I'll take whatever the consensus is. If someone experienced already knows that I'm wrong, better tell me sooner than later. Likewise, if the Godot maintainers share @goatchurchprime view on this, please, let me know too. |
9ba2298 to
3842447
Compare
|
I've just read the last meeting notes and it seems this PR isn't being considered so I won't be doing any more work. |

I think that using the tightness parameter of the SPCAP model as the panning strength was a mistake. Panning strength is an effect that should be applied at the end of the processing to change the final mix. I've done that and I've exposed the tightness parameter in the inspector for more flexibility.
Also, the SPCAP model seems to be developed for 5.1 setups and up. But the paper says that the resulting output is good to be downmixed for setups with less speakers. I've used the 5.1 speaker setup internally, and downmixed for the stereo and 3.1 setups.
With these changes, the audio positioning should be better in stereo and 3.1 setups, also the headphone output, and the panning strength should work more as intended. Tested only with stereo speakers and headphones. Needs more testing with more setups.
With panning strength at 1.0, no speaker will be muted, but I think it wouldn't make sense to forcibly mute any speakers for this case. We achieve maximum stereo separation though, as output by SPCAP. For further separation of channels, tightness can be increased.
This PR will work better together with #103856.