-
Notifications
You must be signed in to change notification settings - Fork 27
Description
On Asterisk 13.5+ combined with LumenVox ASR, we're noticing that UniMRCP-based speech recognition is failing with the following error: ERROR Adhearsion::Translator::Asterisk: <Nokogiri::XML::SyntaxError> The value following "version" in the XML declaration must be a quoted string.
The reason for this is that Asterisk 13.5+ now escapes several characters - including ' " ? - with backslashes \ now for all VarSet (channel variable set) events. So ALL channel variables, including the $RECOG_RESULT variable for conveying NLSML results from speech recognition, are now subject to a different encoding than before.
Add to that, despite the fact that Adhearsion enables the UniMRCP uer option (URI-encoded results), single quote ' is one of the characters that is not typically URI-encoded - and so the single-quotes included in a LumenVox response are not URI-encoded, triggering Asterisk 13.5+'s new functionality to intercede and replace instances of ' with \':
...
Variable: RECOG_RESULT
Value: %3C%3Fxml%20version%3D\'1.0\'%20encoding%3D\'ISO-8859-1\'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.96%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E
Decoded:
<?xml version=\'1.0\' encoding=\'ISO-8859-1\' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.96"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ❌malformed with \'
In contrast, here's how that variable would be received prior to Asterisk 13.5:
...
...
Variable: RECOG_RESULT.
Value: %3C%3Fxml%20version%3D'1.0'%20encoding%3D'ISO-8859-1'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.92%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E
Decoded:
<?xml version='1.0' encoding='ISO-8859-1' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.92"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ✅valid NLSML
The back-slashing of the following characters was introduced with this change in ASTERISK-24934 [patch]Asterisk manager output does not escape control characters
| ASCII Character in C | new 2-character AMI Representation in Asterisk >= 13.5 |
|---|---|
\a (0x07) Alert (Beep, Bell) |
\ a (0x5c 0x61) |
\b (0x08) Backspace |
\ b (0x5c 0x62) |
\f (0x0C) Formfeed Page Break |
\ f (0x5c 0x66) |
\n (0x0A) Newline (Line Feed) |
\ n (0x5c 0x6E) |
\r (0x0D) Carriage Return |
\ r (0x5c 0x72) |
\t (0x09) Horizontal Tab |
\ t (0x5c 0x74) |
\v (0x0B) Vertical Tab |
\ v (0x5c 0x75) |
\ (0x5C) Backslash |
\ \ (0x5c 0x5c) |
' (0x27) Apostrophe or single quotation mark |
\ ' (0x5c 0x27) |
" (0x22) Double quotation mark |
\ " (0x5c 0x22) |
? (0x3F) question mark |
\ ? (0x5c 0x3F) |
Some Strategies for Resolution
-
We could just always attempt to unescape
\, in all versions of Asterisk.
Cons: This would be a change in behavior, and could potentially corrupt data in Asterisk < 13.5. -
We could activate auto-unescaping based on
RubyAMI::Stream#versionbeing >= 2.8.0 since the issue was introduced as AMI_VERSION moved from 2.7.0 to 2.8.0.
Pro: 0-configuration, "It just works" solution.
Cons:
- A complex, stateful solution.
- Introduces the concept of separate modes of Asterisk compatibility.
- We could decide whether unescape or not based on a config value of some sort being enabled.
Pro:
- Straightforward to implement & test.
- We can decide whether or not to default the option to ON or OFF.
Cons: - Introduces the concept of separate modes of Asterisk compatibility.
- NOT 0-configuration -- Rather, if you hit this error, you may have to do a web search for this error and learn that you need to flip this configuration option ON to resolve.
My leaning is towards option 3 2. But I'm very interested in other points of view on the matter. 👀
Cc: @gfaza @lpradovera @bklang