AI deepfakes are now as simple as typing what you want your subject to say



[ad_1]

In the latest example of deepfake technology, researchers have introduced new software that uses machine learning to allow users to edit the text transcript of a video to add, delete, or modify words coming out of the video. someone's mouth.

The work was done by scientists from Stanford University, the Max Planck Institute for Computer Science, Princeton University and Adobe Research. It shows that our ability to edit what people say in videos and create realistic fake becomes easier every day.

You can see a number of examples of the production of the system below, including a revised version of a famous quote Apocalypse Now, with the line "I like the smell of napalm in the morning" was changed to "I like the smell of French toast the morning."

This work is just in the research stage right now and is not available as consumer software, but it will probably not be long before similar services become public. Adobe, for example, has already shared details about the software prototype called VoCo, which allows users to edit speech recordings as easily as an image, and which has been used in this search.

To create video counterfeits, scientists combine several techniques. First, they digitize the target video to isolate the phonemes spoken by the subject. (These are the constituent sounds that compose the words, such as "oo" and "fuh".) They then associate these phonemes with the corresponding visemes, which are the facial expressions accompanying each sound. Finally, they create a 3D model of the lower half of the subject's face using the target video.

When someone edits a text transcript of the video, the software combines all collected data (phonemes, visemes and 3D face model) to create a new footage that matches the text entered. This is then pasted on the source video to create the final result.

In tests in which fake videos were shown to a group of 138 volunteers, about 60% of the participants, although the changes were real. This may seem rather weak, but only 80% of the same group think that the unedited original film is also legitimate. (The researchers noted that this may be due to the fact that the people concerned were informed that their answers were being used for a video editing study, which means they were ready to search for counterfeits.)


An overview of the different steps used to edit the speech of a subject.

As always, it's important to remember that what this technology can do is limited.

The algorithms here only work on talk-style videos, for example, and require 40 minutes of input data. The published speech does not seem to be very different from the source material either. The researchers then asked the subjects to record a new sound to match the changes, using the AI ​​to generate the video. (This is because audio fakes are sometimes mediocre, although the quality clearly improves.)

The researchers also noted that they could not yet change the mood or tone of the speaker's voice as this would result in "strange results". And that any occlusion of the face, for example. if someone waved their hand while talking, give up the algorithm completely.

The technology is therefore not perfect, but these types of limitations are still in the early stages of research and it is almost guaranteed that they will be overcome in time. This means that society as a whole will soon have to tackle the underlying concept of this research: the arrival of software that allows anyone to edit what people are saying in videos without technical training.

The potential drawbacks of this technology are extremely worrying and researchers in this field are often criticized for not taking into account the potential for misuse of their work. The scientists involved in this specific project said that they took these issues into account.

In a blog article accompanying the article, they write: "Although the methods of handling images and videos are as old as the media themselves, the risks of abuse are increased when They are applied to a mode of communication that is sometimes considered as proof of thought and intentions. We recognize that bad actors could use such technologies to falsify personal statements and slander personalities.

But the remedy that they suggest is hardly comforting. In their view, to avoid confusion, videos edited by AI should be clearly presented as such, either through a watermark or through a context (for example, an audience that understands that they are watching a video). fiction film).

But watermarks are easily removed and the loss of context is one of the characteristics of online media. Counterfeits do not have to be irreproachable to have an impact either. It only takes a few minutes of research to dispel a host of fake news articles, but that does not stop their spread, especially in communities that want to believe such lies that match their preconceptions.

Researchers note that this technology also has many benefits. This would greatly help the film and television industries, allowing them to correct poorly-spoken lines without re-recording images and creating seamless dubbing of actors speaking different languages.

But these benefits seem disappointing compared to potential damage. Although it can be argued that fictional propaganda is not as much of a threat as many believe, advances in research such as this remain deeply troubling.

[ad_2]
Source link