[ad_1]
Last week, Google quietly changed a line on its support page for Pixel Buds, which reads: "Google Translation is available on all helmets and Android phones optimized for assistants." This feature was reserved for owners of Pixel Buds and Pixel phones. . And although the company has made no effort to announce it officially, this small change is remarkable.
To understand why, let's start with some history of headphones. Last year, Google launched its brilliant pair of wireless headphones in a climate of great anticipation after selling the product to a promised revolutionary tool: live translation. Just press the buttons and talk about the world "help me to speak (language)" opens the Google Translation app on your phone – up to now, Pixel. From there, you can speak a phrase that is translated and transcribed in the destination language on your phone, and then read. On paper, the performers fear for their work in Google's new technology.
The on-stage demonstration of the live translation tool presented at the product showcase sparked enthusiastic applause, but when the device began to be commercialized, the answer was a little more skeptical: the quality of the translation did not did not meet the expectations of the public.
Tech Insider has tested it in ten different languages. The device successfully translated basic questions such as "Where is the nearest hospital," but as soon as the sentences became more complex, or if the speaker had an accent, things would get lost in the translation. Our critic, himself, came to the conclusion that the live translation was "a bit of a drawback," with Google Assistant struggling to understand the words that were spoken to him.
Daniel Gleeson, Consumer Technology Analyst, said, "It is extremely difficult to master natural language. It would be a huge feat for Google, and the day they do it, they will shout it on the rooftops. "Maybe a reason why the updated Pixel Buds support page has been kept secret, some would say.
Google's problem does not come from the translation process itself. In fact, the company has improved its translation game in recent years. In 2016, Google Translate has been converted into a system based on in-depth learning and based on AI. Until then, the tool translated each word separately and applied language rules to make the sentence grammatically correct – thus leading to somewhat fragmented translations that we know too well. Neural networks, on the other hand, consider the sentence as a whole and guess what may be the right result, based on large sets of text data on which they were previously trained. By using machine learning, these systems are able to take into account the context of a sentence to provide a much more accurate translation.
The integration of machine learning was part of the mission of Google Brain, the branch of the company dedicated to deep learning. Google Brain has also implemented the use of neural networks for another vital tool for live translation, but it seems that everything is going wrong: voice recognition. Indeed, Google Assistant is trained in conversation hours, to which he applies machine learning tools to recognize patterns and ultimately correctly recognize what you are saying when a translation is requested.
Except that it is not the case. Therefore, if Google has managed to apply neural networks with some success for text-to-text translation, why does the Assistant still fail to systematically recognize speech using the same technique? Matic Horvat, researcher in natural language processing at the University of Cambridge, explains that it all boils down to the data set used to form the neural network.
"The systems adapt to the training data set they have received," he says. "And the quality of speech recognition is getting worse when you're presenting it to things you've never heard before. For example, if your training dataset is a conversational speech, it will not be as successful at recognizing speech in a busy environment. "
Interference: it is the enemy of any computer scientist who works to improve speech recognition technology. Last year, Google allocated EUR 150 million from its Digital News Innovation Fund to London start-up Trint, which paves the way for automating speech transcription, using a different algorithm than Google. This algorithm, however, does not solve the fundamental problem of interference.
In fact, the company's website devotes an entire section to recommendations on how to record a speech in a clear environment. He also states that he operates with a margin of error of 5 to 10%, but makes it clear that this applies to clear recordings. There are no official statistics for records that include excessive conversation or background noise. "The biggest challenge is to tell our users that we only have the audio quality they will provide," says Trint CEO Jeff Kofman. "With echoes, noise or even heavy accents, the algorithm will make mistakes."
Google's Pixel Buds are not just bad, they're totally useless
The challenges posed by live speech mean that the training process is the most expensive and time consuming part of creating a network of neurons. And keeping live translation on a limited number of devices, like Google with Pixel Buds, certainly does not help the system to learn. Indeed, the more able to process speech, the more data it can add to its algorithms – and the more the machine can learn to recognize unknown conversation patterns. Google did not offer a spokesperson for an interview, but told us the blog of his assistant on Google Assistant.
For Gleeson, that's one of the reasons why Google has decided to expand the functionality to more hardware. "One of the most difficult problems in speech recognition is collecting enough data on specific accents, familiar expressions, idioms, which are all highly regionalized," he said. "Keeping the feature on the pixel would never let Google reach those regions in sufficient numbers to handle enough data levels."
The accumulation of data, however, has a disadvantage. The most efficient neural networks are those that contain the most data – but these data are stored on CPUs whose size increases with the amount of information stored. Processors, which are still far from being integrated with mobile devices, make speech processing in real time impossible today. In fact, every time you use Google Assistant, the spoken information is sent to be processed externally in a data center, before being sent back to your phone. None of the computational efforts are done locally because existing phones can not store the data that neural networks need to process speech.
Although Google Assistant is able to accomplish this process fairly quickly, Horvat says, real-time speech recognition is still a long way off. One of society's current challenges, which is to improve the transparency of features such as live translation, is to discover how to integrate neural network processing into mobile phones.
In fact, developers are already working on the production of small external chips for the efficient processing of neural networks, which could be integrated into phones. Earlier this month, for example, Huawei announced an artificial intelligence chip that, according to the company, could be used to drive neural network algorithms in minutes.
While Google has its own chip called Edge TPU, it is designed for business use and not yet for smartphones. For Horvat, it's his Achilles heel: as a software publisher, Google does not have much control over manufacturers to ensure the development of a product that would make local neural network processing available to all Android devices – unlike Apple, for example.
In the near future, Google may be forced to take more modest steps to improve its voice recognition technology. And while the live translation has been widely criticized, for Neil Shah, industry analyst, partner and director of IoT research, mobile telephony and ecosystems at Counterpoint, expanding its reach is a way to the company to position itself ahead of the competition: "Google has access to two billion Android users," he says. "It's very well positioned to scale faster than the competition and train with massive inflows of input data as more and more users use the latest voice interactions on Android phones."
Daniel Gleeson agrees. Whether the reviews of this feature remain true or not, Google's move will result in significant improvements. As with all artificial intelligence products, the tool must learn – so, by definition, it remains unfinished on the market. "You run the risk of people saying that it does not work as expected," he says, "but that's the only way it can happen." Interpreters do not need to worry about job.
More beautiful stories from WIRED
– These photos show the devastating impact of human progress
– How to hack your brain to forget nothing
– The race is saving the banana from extinction again
– How r / funny weed mods on russian trolls
– The unpublished story of Stripe, starting payments of $ 20 billion
Do not miss. Sign up for WIRED Weekender to get the best of WIRED in your inbox every weekend
Source link