Make scientific data equitable



[ad_1]

Scientific data is booming – thousands of petabytes were collected in 2018 alone. But these data are not used enough to realize their potential. Most researchers face obstacles when they try to get hold of datasets. Only one-fifth of the published articles generally publish the supporting data in scientific references – as shown by the PLoS ONE1. Too much valuable and hard-won information accumulates dust on computers, disks and tapes.

Scientists do not share the data for several reasons. Those who create data rarely receive credit and when they do, recognition is often limited to quotes. Insufficient support is available for data retention. These problems cover all disciplines, but conversations are disconnected.

That's why more than 100 repositories, communities, corporations, institutions, infrastructure, individuals and publishers (including Springer Nature, the publishers of Nature) signed last November the Enabling FAIR Data Statement of Commitment in Earth, Space and Environmental Sciences for Data Deposits and Sharing (see go.nature.com/2wv2jxd). The principles state that research data must be "identifiable, accessible, interoperable and reusable" (FAIR).2. The idea is not new, but aligning this broad community around common data guidelines is a radical step.

In practice, this means that the vast majority of Earth Science journals will not accept more separate data supplements, which can be difficult to exploit. Editors will insist that key data be available in repositories that adhere to FAIR principles. These changes in policies and practices mean that data is a valuable input to research rather than imported files afterwards. They promise to open avenues for scientific discovery and to improve reproducibility.

The benefits of open data and FAIR are huge. Multi-billion dollar sectors of society use geoscience data for their operations, products and services3. For example, weather forecasts are based on meteorological and other data from around the world. The global positioning system is based on real-time observations of solar activity, atmospheric dynamics and Earth's gravity. Earthquake risk mitigation and verification of the Comprehensive Nuclear Test Ban Treaty use seismic data from instruments around the world.

We now call on the entire scientific community to implement these practices, and here we explain how the geoscience community performed them.

Building blocks

The principles of FAIR are the culmination of more than 20 years of agreements and actions involving publishers, data repositories, funders of scientific research, researchers and others. The principles recommend that scientific data be: "findable" by anyone using common research tools; "Accessible" so that data and metadata can be reviewed; "Interoperable" so that comparable data can be analyzed and integrated using common vocabulary and formats; and "reusable" by other researchers or the public as a result of robust metadata, provenance information and clear user licenses.

For example, searching in Google Dataset Search makes it easy to find datasets in the EarthChem library of the Interdisciplinary Earth Data Alliance. The data is easy to download from a landing page. Dataset formats are aligned with other geochemical, petrological and geochronological data. And they have a long useful life because of the richness of their provenance metadata.

A silhouette in protective gear walks on a road in front of Kilauea volcano lava flow

A member of the Hawaii National Guard measures sulfur dioxide levels in a lava flow near Pahoa.Credit: Terray Sylvestre / Reuters

Data should be as accessible as possible, but should sometimes be restricted for legal or other reasons: the exact location of endangered species observations, for example, is restricted and an approval process must be followed to able to access it. Any restriction of access should be made explicit in the declaration of data availability of the document concerned.

It took only 18 months for the community to adopt, adapt, and align with FAIR's data practices. The effort began in 2017 and involved more than 300 stakeholders and six working groups. It was organized by the American Geophysical Union and was supported by the American philanthropic organization Arnold Ventures (formerly Laura and John Arnold Foundation).

This rapid pace was possible because many of the components of data sharing had already been developed. For example, data communities such as the Research Data Alliance and the Earth Information Information Partners had validated and made applicable the data sharing practices required for repositories, researchers and journals. And an alliance of publishers, journals and repositories – the Coalition for Data Publishing in Earth Sciences and Space – had promoted common policies and procedures for the publication and citation of geoscientific data.

By 2018, all these constituent elements had been brought together in a unified structure that the community wanted to put in place.4,5. The results are formalized as a FAIR enabling data commitment statement. It contains codes of practice for each stakeholder group (repositories, publishers, companies, communities, institutions, funding agencies and organizations, and researchers).

For example, publishers agree to adopt a common set of guidelines for authors for storing and quoting data. The journals progressively delete the list of data in the supplementary information and help the authors to deposit their data in repositories conforming to the principles of FAIR. The repositories provide persistent credentials, curation expertise, landing pages, and support for citation of data in documents. They offer consistent and clear information, easy to find and consult, in easy-to-read human and machine formats, with links to related publications.

Change the culture

Three major changes are essential to change the culture of research in all disciplines:

Make open deposit and FAIR data a priority for everyone. Universities, donors, benchmarks, publishers and companies around the world need to work together to harmonize data sharing approaches and tools. All journals should require that data sources be cited and made accessible, which is an essential element of the integrity of published research. Donors should support leading practices in data management, including long-term archiving of data, especially from publicly funded research. Repositories should follow the results of studies to link assertions and evidence.

Stronger mandates and guidelines are needed to align these actions. The first efforts are in progress. The report6 of the European Commission's expert group on FAIR data has defined the necessary steps. The Australian Research Data Commons has published online training guides and help web pages that allow researchers to correctly cite data, samples, and software (see go.nature.com/2wtuwe8). US research agencies have issued guidelines on ways to increase access to scientific data to meet the requirements of a 2013 memorandum from the Office of Science and Technology Policy.7. While these initial efforts are encouraging, continued international recognition and coordination is needed to align stakeholders and ensure a common expectation and priority.

A researcher takes a sediment core in a cold store

A researcher accesses the sediment cores collected during a drilling program at sea.Credit: Marc Steinmetz / VISUM / eyevine

Recognize and encourage FAIR data practices. These should be codified in the processes of rewarding and establishing institutions. The current measure of the value of a publication is strongly biased against journal citations and poorly reflects the overall value of the research conducted by the authors. However, the data often have a scientific impact far greater than that of the article to which they relate.8.

Researchers should be recognized and recognized for the intellectual effort necessary to provide well-researched, useful and preserved data, that is, to put into practice good scientific knowledge.9. Societies and academies should explicitly include open sharing and fair data processing in the honors and awards criteria for exemplary scholars.

Fund a global infrastructure to support FAIR data and tools. The total cost of providing data that meets all the criteria is unknown. Initial estimates are steep, but remain low compared to potential benefits. This will depend on the amount of scientific data involved and the effort required to comply with FAIR guidelines. The full costs of the international FAIR data infrastructure must also be determined. Parts that can be counted do not have stable support. For example, most repositories face the problem of their sustainability.

Researchers should not be required to assume the full cost of switching to FAIR data. International coordination of funding is needed. Technical solutions must persist during political and technological transitions and go beyond national borders to ensure equal access for researchers in low- and middle-income countries. Research teams must have access to data experts.

Changing culture takes time and persistence, but the problem is urgent. Progress in the geosciences is encouraging. Some technical issues still need to be resolved, but the biggest challenges are organizational and institutional. Other scientists should start addressing them now.

We invite researchers and organizations from all fields of science to join FAIR's enabling data engagement statement, to assess the current state of their disciplines, to get things done and to report on progress.

Competitive financial interests

L.Y. is paid as Director of Community Development for the American Region of the Research Data Alliance (RDA), an international non-profit community led by more than 8,000 members, which aims to create a social data sharing infrastructure and techniques based on principles such as balance, consensus and transparency. J. C.-G. is the founding CEO of WayMark Analytics, a dual-outcome organization that maps stakeholders in complex systems. K. L. is director of the Interdisciplinary Earth Data Alliance, the US National Science Foundation's database for solid Earth data. B. N. is the Executive Director of the Center for Open Science, a non-profit technology and culture organization that provides services to improve the transparency, integrity and replicability of research. ER is paid as executive director of Earth Science Information Partners, a non-profit organization that provides services and strengthens the communities of 120 member organizations to increase the importance of data and their managers in the sciences of the Earth.

[ad_2]

Source link