Our flagship AI experiment continues: have we broken the machine?

[ad_1]

We are now in phase three of our machine learning project i.e., we have moved beyond denial and anger, and we are now sliding into bargaining and depression. I was tasked with using Ars Technica’s mine of data from five years of headline testing, which paired two ideas together in an “A / B” test to allow readers to determine which one to use for an item. The goal is to try to create a machine learning algorithm capable of predicting the success of a given title. And on my last recording, it didn’t go as planned.

I had also spent a few dollars in computing time on Amazon Web Services to find out this. Experimentation can be a bit expensive. (Index: If you have a limited budget, do not use “AutoPilot” mode.)

We had tried several approaches to analyze our collection of 11,000 titles from 5,500 title tests: half winners, half losers. First, we had taken the entire corpus as comma separated values and tried a “Hail Mary” (or, as I see it in retrospect, a “Leeroy Jenkins”) with the Autopilot tool. in AWS SageMaker Studio. This came back with a validation accuracy score of 53 percent. This turns out not to be that bad, in retrospect, because when I used a model designed specifically for natural language processing, BlazingText from AWS, the result was 49% accuracy, or even worse than a draw. . (If much of that sounds absurd, by the way, I recommend revisiting Part 2, where I go over these tools in much more detail.)

It was both a little heartwarming and also a little disheartening that AWS technical evangelist Julien Simon had the same bad luck with our data. Trying an alternative model with our dataset in binary classification mode only yielded an accuracy rate of 53-54%. So now it was time to figure out what was going on and if we could fix it with a few adjustments to the learning model. If not, maybe it’s time to take a whole different approach.

[ad_2]

Source link