OpenAI, a nonprofit research company that studies "the path to safe artificial intelligence," has developed a machine learning system called Generative Pre-Trained Transformer-2 (GPT-2) , able to generate text based on short writing prompts. The result is so close to the imitation of human writing that it could potentially be used for "deepfake" content. Based on 40 gigabytes of text retrieved from Internet sources (including "all outbound links from Reddit, a social media platform that has received at least 3 karma"), GPT-2 generates plausible "news" and news. Other corresponding texts. the style and content of a brief text prompt.
The system's performance was so disconcerting that the researchers are only publishing a smaller version of GPT-2 based on a much smaller body of text. In a blog post about the project and this decision, researchers Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever wrote:
Due to concerns regarding the use of large language templates to generate misleading, biased or abusive language, we only publish a much smaller version of GPT-2 with a sample code. We do not publish the dataset, training code or GPT-2 model weights. Almost a year ago, we wrote in the OpenAI Charter: "We expect security issues to reduce our traditional publications, while reinforcing the importance of sharing security research. , policies and standards. potentially representing the beginnings of such concerns, which we believe will grow over time. This decision, as well as our discussion on this subject, is an experiment: although we are not sure that this is the right decision to make today, we believe that the AI community will have to attack thoughtfully the issue of publication standards. fields of research.
OpenAI is funded by contributions from a group of technology leaders and investors related to what some have called the "Mafia" of PayPal: Elon Musk, Peter Thiel, Jessica Livingston and Sam Altman of YCombinator, former PayPal's director of operations and co-founder of LinkedIn. Reid Hoffman and Greg Brockman, former Stripe Technical Director. Brockman now serves as OpenAI CTO. Musk has repeatedly warned of the potential existential dangers associated with AI, and OpenAI is employing it to try to shape the future of artificial intelligence technology, the company said. 39, ideally moving away from potentially harmful applications.
Given current concerns about how fake content has been used to generate money for publishers of "false information" and potentially spreading misinformation and undermining public debate, the result of GPT-2 is certainly disturbing. Unlike other text generation "bot" models, such as those based on Markov chain algorithms, the GPT-2 "bot" did not lose sight of what it was writing as it went along. that it generated an output, while keeping the context.
For example, in two sentences, GPT-2 generated a false scientific story about the discovery of unicorns in the Andes, a story about the economic impact of Brexit, a report on a theft of nuclear material near Cincinnati, a story about Miley Cyrus caught in the act of stealing shoplifting and a student's report on the causes of the American Civil War.
Each corresponded to the style of the genre from the writing prompt, including the making of quotes from sources. In other examples, GPT-2 sparked a discourse on why recycling is a bad thing, a speech written by John F. Kennedy's brain transplanted into a robot (with footnotes regarding 'exploit itself) and a rewrite of the scene The Lord of the Rings.
While the model required several attempts to obtain a good sample, GPT-2 generated "good" results based on "the familiarity of the model with the context," the researchers wrote. "When prompted with topics highly represented in the data (Brexit, Miley Cyrus, the Lord of the Ringsand so on), it seems to be able to generate reasonable samples about 50% of the time. The opposite is also true: on highly technical or esoteric content types, the model can give poor results. "
GPT-2's word modeling has encountered some weaknesses – for example, researchers have noted that it "sometimes writes about fires occurring under water". But the model could be adapted to specific tasks and give much better results. "We can adjust GPT-2 on the Amazon Reviews dataset and use it to allow us to write reviews conditioned by the star rating and category," the authors explained.
This type of performance would raise all sorts of concerns about the possibility that things like false assessments – and just about any other type of comment – could be "deeply shaken" for economic or political reasons. And this is precisely why OpenAI researchers are reluctant to publish a more complete version of their model for the moment.