Discussing in text is something we take for granted often. It's a complicated process, however, to such an extent that most of the work is done remotely and the end result is returned to our devices. But Google has found a way to reduce the process to such an extent that it can be run locally, and the fruits of that work reach Gboard.

Uncompressed, the models traditionally used by Gboard for speech recognition take about two gigabytes. The size of its storage on a smartphone is too small. So when you tap the microphone icon, your recorded speech is sent to Google's servers to be converted to text, and then that text is returned. Google has been able to form a smaller and equally effective model using recurrent neural network transducer technology. This model is able to run on the device with the same precision as the servers, but it still occupies 450 MB of storage space, which is not small enough to store locally on most smartphones.

Through a process called model quantization, Google was able to further reduce the size of the model, resulting in a package that only takes about 80 MB. This also increases the transcription rate. The new model also works at the character level. Thus, the transcribed text appears letter by letter rather than whole words at a time, as it does now.

You can see a comparison between the server and the transcript on the device below:

Enhanced speech synthesis functionality will initially target Pixel devices in US English, although there is currently no indication of its arrival. Google hopes that it will be available "in more languages ​​and in wider application areas" shortly thereafter.

You can read a much more detailed explanation of the project on the Google AI blog.

Gboard - the Google keyboard
Gboard - the Google keyboard