Synthetic speech generated from brain records



[ad_1]

Synthetic speech generated from brain records

The illustrations of the locations of the electrodes on the neural speech centers of the participants in the research, from which the activity patterns recorded during the speech (colored dots) were translated into a computer simulation of the voice apparatus of the participant (model, right) can then be synthesized to reconstruct the sentence had been spoken (sound wave and sentence, below). Credit: Chang lab / UCSF Department of Neurosurgery

A state-of-the-art brain-machine interface created by neuroscientists at the University of San Francisco can generate synthetic speech with natural sound by using brain activity to control a virtual vocal tract – an anatomically detailed computer simulation that includes lips, jaw, tongue and larynx. The study was conducted with research participants whose speech was intact, but the technology may one day restore the voices of people who have lost the ability to speak due to paralysis and other forms of neurological damage.

Stroke, traumatic brain injury and neurodegenerative diseases such as Parkinson's disease, multiple sclerosis and amyotrophic lateral sclerosis (ALS or Lou Gehrig's disease) often result in an irreversible loss of speech capacity. Some people with serious speech disorders learn to express their thoughts letter by letter with the help of badistive devices to follow very small movements of the eyes or facial muscles. However, the production of text or lyrics synthesized with such devices is laborious, error prone and extremely slow, typically allowing a maximum of 10 words per minute, compared to 100-150 words per minute of natural speech.

The new system developed in the laboratory of Edward Chang, MD, describes April 24, 2019 in Nature– demonstrates that it is possible to create a synthesized version of a person's voice that can be controlled by the activity of the speech centers of his brain. According to the authors, this approach could in the future not only restore fluid communication with people with severe speech disorders, but could also replicate some of the musicality of the human voice that conveys emotions and emotions. the personality of the speaker.

"For the first time, this study demonstrates that we can generate complete spoken sentences based on the brain activity of an individual," said Chang, professor of neurological surgery and a member of the UCSF Weill Institute for Neuroscience. "It is an exalting proof of principle that with technology already within our reach, we should be able to build a clinically viable device for speech-impaired patients."


A brief animation shows how the brain activity patterns of somatosensory cortex speech centers (top left) were first decoded in a computer simulation of vocal tract movements of a research participant (top right), which were then translated into a synthesized version of his movements. voice (below). Credit: Chang lab / Department of Neurosurgery UCSF. Animation of simulated vocal leaflets Mention of source: Speech Graphics

The virtual vocal system improves the naturalistic synthesis of speech

The research was led by Gopala Anumanchipalli, Ph.D., a speech-language pathologist, and Josh Chartier, a graduate student in bioengineering at Chang. He relies on a recent study in which the couple describes for the first time how the speech centers of the human brain choreograph the movements of the lips, jaw, tongue and other components of the tract. vocal to produce a fluid speech.

From this work, Anumanchipalli and Chartier understood that previous attempts to directly decode the speech of brain activity would have had little success, because these brain regions do not directly represent the acoustic properties of speech sounds, but rather the instructions necessary to coordinate the movements of speech. mouth and throat during speech.

"The relationship between the movements of the vocal tract and the sounds of speech produced is complicated," said Anumanchipalli. "We felt that if these centers of speech in the brain encode movements rather than sounds, we should try to do the same to decode these signals."

Synthetic speech generated from brain records

The illustrations of the locations of the electrodes on the neural speech centers of the participants in the research, from which the activity patterns recorded during the speech (colored dots) were translated into a computer simulation of the voice apparatus of the participant (model, right) can then be synthesized to reconstruct the sentence had been spoken (sound wave and sentence, below). Credit: Chang lab / UCSF Department of Neurosurgery

In their new study, Anumancipali and Chartier asked five volunteers treated at the UCSF epilepsy center – patients whose speech was intact and who temporarily had electrodes implanted in their brains to map the source of their seizures in order to of neurosurgery – to read several hundred sentences out loud researchers have recorded the activity of a region of the brain known to be involved in the production of language.

Based on the audio recordings of the participants' voices, the researchers used linguistic principles to reverse engineer the movements of the vocal apparatus necessary to produce these sounds: settling of the lips here tightening the vocal cords, moving the tip of the tongue towards the roof of the mouth, then releasing it, etc.

This detailed mapping of sound to anatomy has allowed scientists to create for each participant a realistic virtual vocal tract that can be controlled by the activity of his brain. This included two "neural network" automatic learning algorithms: a decoder that transforms brain activity patterns produced during speech into virtual vocal tract movements and a synthesizer that converts these movements of the flyer. voice in a synthetic approximation of the participant's voice.

The synthetic speech produced by these algorithms was significantly better than synthetic speech decoded directly from the participants' brain activity without the inclusion of voice channel simulations of speakers, the researchers found. The algorithms produced comprehensible sentences for hundreds of listeners during transcription tests performed by Internet users using the Amazon Mechanical Turk platform.

Synthetic speech generated from brain records

Image of an example of intracranial electrode network of the type used to record brain activity in the present study. Credit: UCSF

As in the case of natural language, transcribers were more successful when given a shorter list of words, as would be the case for caregivers who are prepared for the type of sentences or queries that patients could make. Transcribers have accurately identified 69% of synthesized words from 25 alternative lists and transcribed 43% of sentences with perfect accuracy. With a more difficult choice of 50 words, the overall accuracy of the transcribers dropped to 47%, even though they were still able to perfectly understand 21% of the synthesized sentences.

"We still have some way to go to perfectly imitate spoken language," Chartier said. "We are good enough to synthesize slower vocal sounds such as" sh "and" z ", as well as to preserve the rhythms and intonations of the speech, as well as the speaker's gender and identity, but some the most abrupt sounds such as "b and" p However, the levels of precision that we have produced here would constitute an incredible improvement in real-time communication compared to what is currently available. "

Advances in artificial intelligence, linguistics and neuroscience

Researchers are currently experimenting with higher density electrode arrays and more advanced machine learning algorithms that, they hope, will further improve the synthesized speech. The next major test for technology is to determine if a person who can not speak can learn to use the system without being able to train it in his own voice and generalize it to whatever he wants to say.

Synthetic speech generated from brain records

Image of an example of intracranial electrode network of the type used to record brain activity in the present study. Credit: UCSF

Preliminary results of one of the participants in the research team suggest that the anatomical system of researchers can decode and synthesize new sentences from the brain activity of participants, as well as the sentences on which algorithm has been formed. Even when the researchers provided the algorithm with brain activity data recorded while a participant was only pronouncing sentences without sound, the system was still able to produce intelligible synthetic versions of the sounds. imitated sentences in the speaker's voice.

The researchers also found that the neural code of vocal movements partially overlapped between participants and that the simulation of the vocal tract of a research subject could be adapted to respond to neural instructions recorded from the brain. another participant. Together, these findings suggest that people with speech loss due to a neurological impairment might be able to learn to control a voice-modeled prosthesis on the voice of a person with a speech intact.

"People who can not move their arms and legs have learned to control the robotized limbs with their brains," said Chartier. "We hope that someday people with speech impairments can relearn how to speak using this brain-controlled artificial vocal device."

Anumanchipalli added, "I am proud to have been able to bring together expertise in neuroscience, linguistics and machine learning as part of this major step towards helping patients with neurological disorders."


Study reveals patterns of brain activity underlying fluid speech


More information:
Synthesis of speech from the neuronal decoding of pronounced sentences, Nature, 2019. DOI: 10.1038 / s41586-019-1119-1,

www.nature.com/articles/s41586-019-1119-1

Provided by
University of California at San Francisco


Quote:
Synthetic speech generated from brain records (April 24, 2019)
recovered on April 24, 2019
from https://medicalxpress.com/news/2019-04-synthetic-speech-brain.html

This document is subject to copyright. Apart from any fair use for study or private research purposes, no
part may be reproduced without written permission. Content is provided for information only.

[ad_2]
Source link