Synthesized speech from cerebral signals | Neuroscience



[ad_1]

A team of neuroscientists at the University of California at San Francisco used brain signals recorded in patients with epilepsy to program a computer to mimic natural speech, a move that could one day profoundly affect the ability to communication of some patients. The results were published in the journal Nature.

A brain-machine interface created by Anumanchipalli et al. Can generate synthetic speech with natural sounds by using brain activity to control a virtual vocal tract. Image credit: University of California at San Francisco.

A brain-machine interface created Anumanchipalli et al can generate synthetic speech to natural sounds by using brain activity to control a virtual vocal tract. Image credit: University of California at San Francisco.

Technology that translates brain activity into language would be a transformative factor for people who are unable to communicate because of neurological disorders.

Decoding speech from a neuronal activity is a challenge because speaking requires very precise and rapid control of articulators of the vocal tract.

Professor Edward Chang of the University of California at San Francisco and his colleagues have designed a neural decoder that uses sound representations coded in brain activity to synthesize audible speech.

"Speech is an amazing form of communication that has evolved over the millennia to become very effective. Many of us think how easy it is to talk, so losing that ability can be so devastating, "said Professor Chang.

"For the first time, our study demonstrates that we can generate complete spoken sentences based on the brain activity of an individual."

The research builds on a recent study in which the team first described how human brain speech centers choreograph the movements of the lips, jaw, tongue, and other components of the human brain. vocal tract to produce a fluid speech.

From this work, the researchers understood that previous attempts to directly decode the speech of brain activity would have had little success, since these brain regions do not directly represent the acoustic properties of speech sounds, but rather the instructions needed to coordinate the movements of the mouth. and throat during speech.

"The relationship between vocal tract movements and speech sounds produced is complex," said co-author Dr. Gopala Anumanchipalli, a speech specialist at the University of California at San Francisco.

"We reasoned that if these centers of speech in the brain encode movements rather than sounds, we should try to do the same to decode these signals."

As part of this study, neuroscientists asked five volunteers treated at the University of California San Francisco epilepsy center – patients whose language was intact and who had electrodes temporarily implanted in the brain – to map the source of their seizures for neurosurgery – to read several hundred sentences. Out loud, the researchers recorded the activity of a region of the brain known to be involved in language production.

Based on audio recordings of the participants' voices, they used linguistic principles to reverse engineer the movements of the vocal apparatus needed to produce these sounds: squeezing the lips of the participants. one against the other, tighten the vocal cords, move the tip of the tongue towards the roof of the mouth, then release it, and so on.

This detailed mapping of sound to anatomy allowed the authors to create for each participant a realistic virtual vocal tract that can be controlled by the activity of his brain.

This included two "neural network" machine learning algorithms: a decoder that transforms the brain activity patterns produced during speech into movements of the virtual vocal tract and a synthesizer that converts these vocal tract movements into a synthetic approximation of the voice of the participant.

The synthetic speech generated by these algorithms was significantly better than the synthetic speech decoded directly from the brain activity of the participants without the inclusion of voice channel simulations of the speakers.

The algorithms produced comprehensible sentences for hundreds of listeners during transcription tests performed by Internet users using the Amazon Mechanical Turk platform.

As in the case of natural language, transcribers were more successful when given a shorter list of words, as would be the case for caregivers who are prepared for the type of sentences or queries that patients could make.

The transcribers accurately identified 69% of the synthesized words from 25 alternative lists and transcribed 43% of the sentences with perfect precision.

With a more difficult choice of 50 words, the overall accuracy of the transcribers dropped to 47%, even though they were still able to fully understand 21% of the synthesized sentences.

"We still have some way to go to perfectly imitate the spoken language," said co-authors Josh Chartier, a graduate student in bioengineering from the University of California at San Francisco.

"We are good enough to synthesize slower vocal sounds like" sh "and" z ", as well as to maintain the rhythms and intonations of speech, as well as the speaker's gender and identity, but some of the the steepest sounds such as "b and p a little blurry. "

"Nevertheless, the levels of precision we have produced here would be an incredible improvement in real-time communication compared to what is currently available."

Researchers are currently experimenting with higher density electrode arrays and more advanced machine learning algorithms that, they hope, will further improve the synthesized speech.

The next major test for technology is to determine if a person who can not speak can learn to use the system without being able to train it in his own voice and generalize it to anything he wants to say.

_____

Gopala K. Anumanchipalli et al. 2019. Speech synthesis from the neuronal decoding of spoken sentences. Nature 568: 493-498; doi: 10.1038 / s41586-019-1119-1

[ad_2]
Source link