matteo matteo - 4 months ago 37x
Android Question

How does Google Keep do Speech Recognition while saving the audio recording at the same time?

Android's SpeechRecognizer apparently doesn't allow to record the input on which you're doing speech recognition into an audio file.
That is, either you record voice using a MediaRecorder (or AudioRecord for that matter) or you do Speech Recognition with a SpeechRecognizer, in which case the audio isn't recorded into a file (at least not one you can access); but you can't do both at the same time.

The question of how to achieve recording audio and doing speech recognition at the same time in Android has been asked several times, and the most popular "solution" is to record a flac file and use Google's unofficial Speech API which allows you to send a flac file via a POST request and obtain a json response with the transcription. (outdated Android version)

That works pretty well but has a huge limitation which is it can't be used with files longer than about 10-15 seconds (the exact limit is not clear and may depend on file size or perhaps the amount of words). This makes it not suitable for my needs.

Also, slicing the audio file into smaller files is NOT a possible solution; even forgetting about the difficulties in properly splitting the file at the right positions (not in the middle of a word), many consecutive requests to the abovementioned web service api will randomly result in empty responses (Google says there's a usage limit of 50 requests per day, but as usual they don't disclose the details of the real usage limits which clearly restrict bursts of requests).

So, all this would seem to indicate that getting a transcription of speech while at the same time recording the input into an audio file in Android is IMPOSSIBLE.

HOWEVER, the Google Keep Android app does exactly that.
It allows you to speak, transcrbes what you said into text, and saves both the text and the audio recording (well it's not clear where it stores it, but you can replay it).
And it has no length limitation.

I would look at the source code but it doesn't seem to be available, is it?

I sniffed the packets Google Keep sends and receives while doing speech recognition, and it definitely does NOT use the speech api mentioned above. All the traffic is TLS and (from the outside) it looks pretty much the same as when you're using SpeechRecognizer.

So does perhaps a way exist to kind of "split" (i.e. duplicate, or multiplex) the microphone input stream into two streams, and feed one of them to a SpeechRecognizer and the other to a MediaRecorder?


Google Keep launches RecognizerIntent with certain undocumented extras and expects the resulting intent to contain the URI of the recorded audio. If RecognizerIntent is serviced by Google Voice Search then it all works out and Keep gets the audio.

See record/save audio from voice recognition intent for more information and a code sample that calls the recognizer in the same way as Keep (probably) does.

Note that this behavior is not part of Android. It's simply the current undocumented way of how two closed-source Google apps communicate with each other.