Google makes voice synthesis and voice synthesis services more accurate and accessible


Google LLC is pushing Cloud-to-Text and Text-to-Speech cloud-based learning services, adding new features and making them more accessible to large organizations.

Google Cloud Speech-to-Text is essentially an advanced transcription service that relies on machine learning and other artificial intelligence technologies to improve its accuracy. This use of artificial intelligence is important because any accuracy below 100% can lead to serious errors making the quality of conversations difficult, said Google product manager Dan Aharon in an article published today. hui.

But Google willingly admits that his speech in text is not always accurate, because many applications using it work on "noisy" phone lines, which can make it difficult to accurately interpret what people say .

"When creating smart voice applications, the accuracy of speech recognition is essential," said Aharon. "As you can see with the illustration below, even with 90% accuracy, it's hard to have a useful conversation."


To take this into account, last year Google introduced a premium version of text-to-speech for customers who opt to share usage data so they can refine their algorithms. They include an improved phone model that produces 62% fewer transcript errors than the standard model and a useful video model for multi-speaker conference calls.

Today, Google offers its high-end models to all customers, including those who do not wish to join its data logging program. However, do not join the program at a price, because those who join will pay 33% less for the service.

pasted-image-0 "width =" 1600 "height =" 512 "srcset =" .png 1600w, 300w, /wp-content/blogs.dir/1/files/2019/02/pasted-image-0-768x246.png 768w, 2019/02 / glued-image-0-800x256.png 800w "sizes =" (maximum width: 1600px) 100vw, 1600px "/></p>
<p>"We've also cut prices for the premium video model by 25%, which represents a total savings of 50% on current video model customers who opt for data logging," said Aharon.</p>
<p>Google is also adding a new feature called Speech-to-Text called "multichannel recognition," which makes it easier to distinguish different people in a conversation.</p>
<p>Top-of-the-line Speech-to-Text models have already been adopted by many companies, including LogMeIn Inc., which uses this service to create meeting transcripts on its popular GoToMeeting app.</p>
<h4><strong>More voices and languages ​​for speech synthesis</strong></h4>
<p>Google also updates the speech synthesis, which is exactly the opposite of its sister service, transforming written text into artificial speech in realistic human voices. The service is receiving more and more artificial voices using Google's WaveNet technology and is available in more languages. "Unique access to WaveNet technology powered by Google Cloud TPUs [Tensor Processing Unit AI chips]we can create new voices and new languages ​​faster and easier than is typical in the industry, "said Aharon.</p>
<p>New languages ​​introduced in Beta today include Danish, Norwegian, Portuguese, Russian, Polish, Slovak and Ukrainian. This means that voice synthesis now supports 21 languages ​​in total. Google is also adding 31 new WaveNet artificial voices to the service, as well as 24 "standard" voices.</p>
<p>Finally, Google adds a new Device Profiles feature to speech synthesis, which can optimize audio playback on different types of hardware. "For example, some customers using call center applications optimize interactive voice response, while others, who focus on content and media, optimize headphones," said Aharon. "In any case, the audio effects are customized for the hardware."</p>
<p>Constellation Research Inc. analyst Holger Mueller said the updates are compelling because speech is becoming the new user interface. Companies must therefore welcome improvements in the accuracy and support of more languages.</p>
<p>"Google continues to provide and improve, and now adds support for better consumability," Mueller said. "CxOs that build voice-related applications should simply include Google in their list of active providers."</p>
<h5 style=Photo: Robert Scoble / Flickr

Since you're here …

… We would like to tell you about our mission and how you can help us fulfill it. The business model of SiliconANGLE Media Inc. is based on the intrinsic value of the content, not the advertising. Unlike many online publications, we do not have paywall or banners because we want to keep our journalism open, with no influence or need to chase the traffic.SiliconANGLE journalism, reporting and commentary – as well as live unscripted live videos from our Silicon Valley studio and video crews from around the world. The cube – takes a lot of work, time and money. To maintain quality, we need the support of sponsors who adhere to our vision of the content of journalism without advertising.

If you like reports, video interviews and other ad-free content here, please take a moment to review a sample of the video content supported by our sponsors. tweet your support, and continues to return to SiliconANGLE.


Source link