[ad_1]
Everyone wants our data and expects us or we to benefit greatly. They criticize exaggerated expectations Big Data . Do not you sometimes feel like a lonely spoiler?
Sometimes, maybe. But I also see that I'm moving something. People have the intuition that something is wrong. The big data is only the triggering of false promises and inflated expectations, especially in terms of diagnosis and new therapies for the disease. Risks are rarely addressed. And the costs are not. In Germany, we are already spending billions to collect personal medical data without knowing what we can take advantage of.
At least since the data scandal on Facebook, there is a lot of talk about privacy and ethics. But they mainly concern big data in the health sector. Where is the problem? Anyone who knows better can treat better.
I do not want to minimize privacy issues. They are real. But the difficulties go far beyond. More data simply does not mean more knowledge. On the contrary. This goes against intuition and is not easy to explain.
Try it
An example I bring to my lectures: according to statistics, per capita cheese consumption in the United States is virtually the same as the number of deaths in the United States. tangling in your own sheets. There is a purely numerical link between the two parallel developments. But it would be a complete nonsense to conclude that you could reduce the risk of death with less cheese. Such false and random correlations can be found many times when you browse large amounts of data. The problem with large volumes of data is that their share is growing faster than real connections.
Why?
There is a complicated calculation behind this, but it can be illustrated. If I have studied the influence of a hundred factors and have added one more, it can be related to each one of them. So, the simplest variant, I get a hundred more connections.
By a single additional factor?
Exactly. This means that the probability of random but incorrect correlations increases faster than the number of data itself increases. With large amounts of data, this effect becomes dramatic. We speak in the context of noise. The sound of false correlations obscures the real. A problem that also affects the media. With data journalism, there is also the danger that you will come across a connection and make a story of it. The correlation was random and meaningless.
"More data does not mean more knowledge. On the contrary. "
People Who Analyze Large Datasets Can Avoid Such Errors
When it comes to statements, the truth is often not taken very seriously. There is a lot of nude marketing.Many people also do not understand the basics.That's why there are no specific applications, especially in medicine.Big Data is synonymous with the promise that everything is better with access to all data.I do not mind if I pay a new suit at the cashier with the credit card and that the good tie appears on my cellphone. is different in medicine.If I decide to undergo a treatment based on a false data analysis, it can have disastrous consequences.Add to this is the economic damage.The data collected in the blind must then be cleaned so that real connections can be discovered
The sloppy data analysis is nothing new. This already existed before Big Data.
There is a difference. Big Data supporters would like to get rid of the statistical achievements of the last 100 years. For example, they claim that from a certain amount of data, a correlation automatically becomes causal. The certainty of a prediction is then so high that it becomes a reality. A highly noted article in 2008 titled "The end of theory: data flow makes the scientific method superfluous". It's pure nonsense. This corresponds to this attitude that large data analyzes do not typically specify ranges of uncertainty as it is common in statistical analysis.
Why?
Data scientists live in a different world and have their own ideology. For us, this lack of error awareness is irritating. We practice evidence-based medicine, and there is uncertainty because of missing or inaccurate data daily. For people who calculate with large amounts of data, this awareness is largely absent. They prefer to speak of the end of chance. In a book of the same name, for example, we say that thanks to the many data, everything can be predicted and therefore there is no coincidence.
Is this completely bizarre?
This is not thought of at all. For example, the idea that if the police have enough data, they can predict burglaries. The models, which up to now are behind the corresponding prognostic programs, are downright embarrassing. The proof of their benefits is simply anecdotal. As for large data in general. There is no reliable evidence for a global benefit.
The big data phenomenon is relatively new. Are there any examples of where it went wrong?
The best-known are the flu trends, the prediction of Google's flu. There was first the huge noise. Normally, the spread of influenza can only be followed in a relatively uncomfortable manner on the basis of medical visits. A decade ago, Google claimed that it was possible with user clicks. It worked for two years. Then everything hit the wall. They just had two years of luck.
But it is clear that search queries provide clues about the health status of users.
Yes, yes. But you have no control over systematic errors. If the flu is far away, but the media is talking about it, then people will click on the appropriate terms, even if they are completely healthy. Click here to reflect infection rates does not work
The IBM Watson wizard uses medical data to create medical diagnoses with the help of large data
MD Anderson, Texas University, is one of the largest and most respected in the world. They ended the project last year after investing $ 62 million. According to the media report, knowledge was manually entered into the computer, rather than by machine learning, as is often read in advertising. It is embarrassing, on the other hand, fraud. And third, the results were worse than those of the doctors.
Would it ever be possible to make medical diagnoses using large amounts of data?
Yes, it will certainly be better. But it is still not so long. In Germany, the Ethics Council recently advocated less stringent data protection to allow the analysis of additional data. I say no to that. The possible benefit ideas are largely wishful thinking. Spending billions in the hope that solutions will go without saying is useless. (Tages-Anzeiger)
Created: 22.07.2018, 19:20
Source link