"Statistical significance" is overused and often misleading

Statisticians say that it may not be wise to put all their eggs in the basket of meaning.

intraprese / Getty Images

hide legend

toggle the legend

intraprese / Getty Images

Statisticians say that it may not be wise to put all their eggs in the basket of meaning.

intraprese / Getty Images

A recent study that questioned the safety of eggs raised a perennial question: why did studies, as was the case for egg health research, switch so often from one response to another?

The truth does not change all the time. But one of the reasons for these fluctuations is that scientists are struggling to handle the uncertainty inherent in all studies. A new effort is being made to address this gap in a widely used – and misused – scientific method.

Scientists and statisticians have put forward a bold idea: ban the very concept of "statistical significance".

We hear this phrase all the time in connection with scientific studies. Many critics say that declaring a statistically significant or non-statistically significant result requires answering complicated questions as true or false.

"The world is much more uncertain than that," said Nicole Lazar, a professor of statistics at the University of Georgia. It is involved in the latest effort to ban the use of the term "statistical significance".

A full issue of the newspaper The American statistician is devoted to this issue, with 43 articles and an editorial of 17,500 words co-written by Lazar.

Some of the scientists involved in this effort also wrote a more digestible commentary that appears in the Thursday issue of Nature. More than 850 scientists and statisticians reported to the Nature commentators, they want to approve this idea.

In the early 20th century, the father of statistics, R.A. Fisher, developed a test of meaning. It involves a variable called the p value, which he wanted to be a guide to judging the results.

Over the years, scientists have distorted this idea. They created an arbitrary threshold for the p value, usually 0.05, and use it to indicate whether a scientific result is significant or not.

This shortcut often determines whether the studies are published or not, whether the scientists are promoted and to whom the grant is awarded.

"The situation is really out of proportion," said Ron Wasserstein, executive director of the American Statistical Association. He advocated this change for years and it's not alone.

"Failure to make these changes really begins to have a lasting negative impact on the way science is conducted," he said. "It's time to start making the changes – it's time to get going."

There are many disadvantages to this, he says. The first is that scientists manipulate their data so that their results reach this magic threshold. Worse, scientists often perceive that they can not publish their interesting results (even if they are somewhat ambiguous) if they are not statistically significant. But this information is actually always helpful, and advocates say that just wasting it is wasting it.

Some figures in the world of statistics reject the call for the abolition of the term "statistical significance".

"Nature should invite someone to highlight the weaknesses and dangers of some of these recommendations, "says Deborah Mayo, Virginia Tech's philosopher of science.

"Prohibiting the word" importance "could well release researchers from any responsibility when they would minimize negative results" and otherwise manipulate their results, "she notes.

"We should beware of dropping something that would allow us to hold researchers accountable for their actions."

His desire to maintain "statistical significance" is deeply rooted.

Scientists – like the rest of us – are much more likely to believe that a result is true if it is statistically significant. Nevertheless, Blake McShane, a statistician at the Kellogg School of Management at Northwestern University, says that we rely too much on the concept.

"All the statistics vary naturally from one study to another," explains McShane. This is because there is a lot of variation from one group of people to the other and also because subtle differences in approach can lead to different conclusions.

So, he says, we should not be surprised at all if a statistically significant result in one study does not exceed that threshold in the next.

McShane, who co-wrote the Nature This comment also explains, in part, why studies done in a laboratory are often not duplicated in other laboratories. Sometimes we talk about "reproducibility crisis", whereas in reality, the apparent conflict between studies can be an artifact of relying on the concept of statistical significance.

Despite these shortcomings, science considers statistical significance as a shortcut that gives at least a glimpse of the strength of an observation.

Journals are reluctant to abandon the concept. "Nature is not trying to change the way it looks at statistical analysis in the evaluation of articles at the present time, "noted the newspaper in an editorial accompanying the commentary.

Véronique Kiermer, publisher and editor-in-chief of PLOS journals, laments the excessive weight given to statistical significance, but asserts that her journals do not have the weight to impose change.

"The problem is that the practice is so entrenched in the research community," she writes in an email, "that change must begin as soon as assumptions are formulated, experiences designed and analyzed, and when researchers decide to write and publish their work. "

A problem is what scientists would use instead of statistical significance. Advocates of change claim that the community can still use the p-value test, but as part of a more comprehensive approach to measuring uncertainty.

A little more humility would also be appropriate, say these advocates of change.

"Uncertainty is always present," says Wasserstein. "It's part of science, so rather than trying to dance around that, we [should] accept it."

It goes a bit against human nature. After all, we want answers, no more questions.

But McShane feels it is too simplistic to come up with an answer in the affirmative. If we go further, we can ask more important questions. What is the magnitude of the risk? What is the probability that this is real? What are the costs and benefits for an individual?

Lazar has an even more extreme vision. She says that when she hears about individual studies, like the one about the egg, her statistical intuition leads her to shrug her shoulders: "I'm not even paying attention to that anymore."

You can contact NPR Scientific Correspondent, Richard Harris at the address [email protected]

Source link