public class GoogleASRComponent extends MixedComponent
Using the Google Speech API directly is no longer an ideal solution because it is limited to 50 requests/day (even with an API key). However, the implementation of the Web Speech API in Google Chrome is exempt from this limit, hence the dirty workaround that this component implements.
The component works by setting up a WebSocket server and having a Web page open in Chrome connect to it and send it the results of the Web Speech API. Unfortunately, for Chrome to authorize the Web page to access the microphone and remember the authorization, the page has to be served via an HTTPS server. The page, in turn, needs to access the WebSocket server via TLS too. It is thus necessary to create a self-signed certificate and tell Chrome to accept it as valid. The process is described below.
localhostfor the Common Name/CN; the other values do not matter as much.
The resulting files must then be converted into a PKCS #12 file. This can be done by issuing the following command and entering an empty password (adjust the output path as needed):
openssl req -new -newkey rsa:4096 -days 5000 -nodes -x509 -sha512 -out cert.crt -keyout cert.key
openssl pkcs12 -export -in cert.crt -inkey cert.key -out $PATH_TO_AGENTSLANG/data/org/agent/slang/in/google/cert.p12
.p12file can actually be installed anywhere, but this location is where the default configuration files point to.
cert.crtfile previously created and choose "Trusted Root Certification Authorities" as the destination.
Finally, restart Chrome (this is a very important part of the process).
certificateparameter of the component must point to a valid
*.p12file (as created in the previous section). In addition, the
languageparameter must contain the language code corresponding to the language that should be used for recognition.
https://localhost:8149/should be opened in Chrome (this should happen automatically) and permission to use the microphone should be granted, if Chrome asks.
Note that the AgentSlang component being the WebSocket server, it must be started before the Web browser.
(Technically, the component could send the language code at any moment, even though it currently does not.)
After each iteration, recognition is started again, allowing for continuous
SpeechRecognition object natively supports a
continuous mode, but it seems slower than using the non-continuous mode
At any moment, the AgentSlang component can send either
start to stop or resume recognition. (It is currently used for
suspending recognition while the agent speaks, so that it does not hear
Conversely, if the Web page is closed while the AgentSlang component is still active, it can simply be reopened. OS Compatibility: Windows and Linux
|Constructor and Description|
|Modifier and Type||Method and Description|
Closes and stops the connection with google ASR port.
Checking type of output data.
Checking type of input data
public GoogleASRComponent(java.lang.String outboundPort, ComponentConfig config)