public class GoogleASRComponent
extends MixedComponent
Using the Google Speech API directly is no longer an ideal solution because it is limited to 50 requests/day (even with an API key). However, the implementation of the Web Speech API in Google Chrome is exempt from this limit, hence the dirty workaround that this component implements.
The component works by setting up a WebSocket server and having a Web page open in Chrome connect to it and send it the results of the Web Speech API. Unfortunately, for Chrome to authorize the Web page to access the microphone and remember the authorization, the page has to be served via an HTTPS server. The page, in turn, needs to access the WebSocket server via TLS too. It is thus necessary to create a self-signed certificate and tell Chrome to accept it as valid. The process is described below.
localhost
for the
Common Name/CN; the other values do not matter as much.
openssl req -new -newkey rsa:4096 -days 5000 -nodes -x509 -sha512 -out cert.crt -keyout cert.key
The resulting files must then be converted into a PKCS #12 file. This can be
done by issuing the following command and entering an empty password (adjust
the output path as needed):
openssl pkcs12 -export -in cert.crt -inkey cert.key -out $PATH_TO_AGENTSLANG/data/org/agent/slang/in/google/cert.p12
The resulting .p12
file can actually be installed anywhere, but
this location is where the default configuration files point to.
cert.crt
file previously created and choose "Trusted Root Certification
Authorities" as the destination.
Finally, restart Chrome (this is a very important part of the process).
cert.crt
.
certificate
parameter of the component must point to a
valid *.p12
file (as created in the previous section). In
addition, the language
parameter must contain the language
code corresponding to the language that should be used for recognition.
https://localhost:8149/
should be opened in Chrome (this should
happen automatically) and permission to use the microphone should be granted,
if Chrome asks.
Note that the AgentSlang component being the WebSocket server, it must be started before the Web browser.
(Technically, the component could send the language code at any moment, even though it currently does not.)
After each iteration, recognition is started again, allowing for continuous
recognition. (The SpeechRecognition
object natively supports a
continuous mode, but it seems slower than using the non-continuous mode
repeatedly.)
At any moment, the AgentSlang component can send either stop
or
start
to stop or resume recognition. (It is currently used for
suspending recognition while the agent speaks, so that it does not hear
itself.)
Conversely, if the Web page is closed while the AgentSlang component is still active, it can simply be reopened. OS Compatibility: Windows and Linux
Constructor and Description |
---|
GoogleASRComponent(java.lang.String outboundPort,
ComponentConfig config) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes and stops the connection with google ASR port.
|
void |
definePublishedData()
Checking type of output data.
|
void |
defineReceivedData()
Checking type of input data
|