The Public
Domain of Digital Research Data
OECD Follow-up
Group on Issues of Access to Publicly Funded Research Data
|
OECD Follow
up group on issues of access to research data from public funding
REPORT ON PROGRESS
AND COMING ACTIVITIES (March 2002)
Executive Summary
Since the 77th meeting of CSTP (18-19 October 2001), the working group has accomplished
the following:
- Completion of two "quick elementary studies" (NIWI - Paul Wouters,
to be published by end of March 2002) on:
- Current relevant regulation on data sharing
in the US with the key observation that US regulation on this topic is dominated
by the Freedom of Information act and the Bayh-Dole act, showing the importance
of the availability and constraining of suitable national legislation on Access
and Sharing of Research Data. Other aspects of regulation protect investments
by researchers, privacy of individuals, and national security.
- Mini Survey among ESF members organization
show that policies on Access to and Sharing of Research Data are far from
being common as in the US.
-
Other work underway
- Social Informatics Research by Geoffrey
Bowker and Kathleen Casey, that will add both a research and a policy dimension
to the working group, was funded by the National Science Foundation, and will
be completed with the completion of the working group report.
- Legal Framework: A comprehensive study on
the national legislation relevant to access to and sharing of research data,
its purposes and underlying legal principles, relevant international treaties,
additional regulation and relevant jurisprudence. The proposal was drafted
by Hans Franken, professor of Law at Leiden University (and chairman of the
third Global Research Village Conference). The project was accepted by the
Netherlands National Research Programme "Information Technology and the
Law" (ITER) and will be carried out during the Spring of 2002.
- Study on Trust: This study will be conducted
by NIWI (Anne Beaulieu), to understand the extent to which data are being
shared between researchers and non-expert audiences.
- Case Studies: to be undertaken by various
members of the group, covering "big" science projects as well as
broad community practices.
- Economics of data sharing: Workshop to be
held in September, organized by MERIT, to look at economics of digital research
data.
- Additional Members of the working group: Since
the meeting on October, several other countries are now involved in this project.
These include representatives from Japan, Canada, and Finland.
REPORT ON PROGRESS AND COMING ACTIVITIES
1. Working Group
At the 76th The OECD/CSTP meeting of 13/14 March 2001 (ITEM 14, 32.), the CSTP
" ii) Agreed to the proposals presented by the Dutch and Polish Delegations
in Room Document No.6 concerning the follow-up to the Third Conference on the
Global Research Village, including
- the creation of a working group on access to research information, and
- the preparation of the Fourth Conference to be held in Poland in 2002."
1.1. Update
This document gives an update on DSTI/STP(2001)35,
26 September 2001, and Room Document No. 4. From the 77th meeting of the CSTP
at 18-19 October 2001 concerning the Working Group on research data issues. It
was finalised on 11 March 2002.
1.2. Contents
- Section 2 gives the rationale for the activities
of the Working Group: working out strategies in research policies and practices
to cope with the dramatically increased amounts of research data generated by
the Global Science System
- Section 3 summarises the preparatory activities undertaken since the 76th
CSTP meeting leading to agreement on "Points of departure" and program
of aims and means
- Section 4 sketches the Work Plan for the activities of the group for the period
from March 2002 until March 2003
- Section 5 contains summaries of the results of the preliminary studies commissioned
by the Working Group as well as the projects currently being carried out and
projects being planned.
- Appendix 1. gives a brief record of the origin of the Working Group
- Appendix 2. states the "Points of departure"
- Appendix 3. contains a list with names and URL's of the current members and
associates.
2. Rationale
Data as the basis of the value chain for Science
and Society
2.1. Preface
Why is data access and sharing of publicly funded
data an issue, now?
As the twenty-first century opens, a set of different
national policies concerning information and communication technologies and scientific
research have come into play. Ranging from the European Union Database Directive
to the Freedom of Information Act, these policies demand attention, particularly
as they pertain to the role of science in society and the public's access to research
products. Similar to the legislation developed during the early 20th century that
sought to balance economic gains with societal good, contemporary changes in information
and communication technology require a re-examination of the role of government
in issues of balancing economic gains, science and society.
The changes in information and communication technology
(ICT) have been tremendous for society in general, and science in particular.
Science is being enhanced by the ICT revolution in its ability to both acquire
new data as well as to share it, at unprecedented scales of time (almost instantaneously)
and volume. The science system is facing a tidal wave of data that it must address
to ensure optimal return on the data-generation investments. In parallel, the
larger issues of society such as national security, individual privacy, and intellectual
property rights are all being re-examined in the light of ICT revolution. Decisions
today in each of these realms, often without due consideration for the role of
science in society, or the mechanisms of the scientific enterprise, such as the
sharing of and access to data (the life blood of the scientific enterprise), will
have profound consequences.
Science is a driver of innovation and economy as
well as a developer of knowledge. As the access to data is an integral part of
the science process as well as a driver of the economy, it is appropriate to understand
impact of technology that enable science in the context of commercialisation,
privacy and security and their impact on science in order to ensure optimal benefit
of public investment in science.
We see on a daily basis the promise that data sharing can bring to science and
society. However, the lack of policies that address the access and sharing of
research data alongside the proliferation of competing policies have created a
situation that could threaten underlying principles associated with data sharing
and its benefits for science, the economy, and society.
2.2. A quantum leap in data use
The use of ICT has radically enlarged the scale
of research in the Global Science System of the 21th Century. Increased computing
capacity has led to the deployment of larger and more complex workforces of researchers
connected by ever larger and more complex computer networks using larger and more
complex instruments powered by complex software - to both produce and use ever
larger masses of digital data.
Sir Isaac Newton was able to revolutionise physics
working with pen and paper from a limited set of observations (not yet called
data) and exchanging letters with only a few European colleagues.For a next step
in contemporary physics, scientists are joining forces to build the Large Hadron
Collider at CERN that will accumulate 10 Petabytes of digital data (10 quadrillion
bytes, about 20 Million CD-ROMs) per year. In order to deliver the data to the10.000
researchers from 1000 research institutes in 50 different countries and to enable
them to use the data, the LHC Computing Grid is being developed: CPU power over
4 Million SPECint95, LAN throughput about a Terabit p/sec. at dozens of sites,
WAN capacity many Gigabits p/sec. to hundreds of sites.
2.3. Data policies and data management
CERN certainly ranks among the 'Big Science' institutes
that are best prepared to handle this quantum leap in data use. However, many
other research institutes in different fields as for instance Structural Genomics,
Biodiversity, Atmospheric Research and Remote Earth Observation, that will generate
comparable amounts of data, do not always seem prepared for the data flood that
is coming their way.
To ensure optimum production and use of the growing masses of digital research
data for science and society, researchers will have to adapt the ways they handle
data to the changing circumstances. At the institutional level additional attention
from science policy and research management is called for.
2.4. Data driven science
By using and producing ever increasing masses of
digital data, science is becoming more and more "data driven". Every
day the availability of the right sets of digital data is becoming more important
to the success of research. This goes for natural sciences as well as social sciences
and the humanities. Costs for the collection and processing of data are taking
up a growing proportion of the research budgets.
2.5. Changing perspectives
Use of ICT has led to a spectacular increase in
the possibilities for the collection and use, the input and output, of digital
data in research. However, the traditional framework of policies and management
(and bookkeeping) pays attention to categories as for instance workforce, facilities
and publications, but usually does not yet take into account data use as a relevant
distinct component. Looking at the relevant figures, one could easily get the
impression that the Global Science System is not quite prepared to handle the
tsunami of research data that is coming its way.
2.6. A force of its own
Instead of looking at the supply of research data
as an integrated trajectory that needs systematic planning and budgeting starting
from the collection of data, via processing, storage and dissemination to end
with sustainable archiving, the parties involved tend to pay attention only to
parts of the chain. Instead of taking data as a distinct factor with a broad range
of application, data often seem to be considered as undifferentiated incidental
research costs. Instead of looking at data as resources for multiple uses in different
projects by different research institutes, data are often inaccessible to users
other than the ones who set up the original collection.
2.7. Additional policies and strategies
The amount of data used in research is growing
at a staggering rate and so is the expenditure on data and data-related activities.
The consequences of this increase will have a great impact on the future of the
Global Science System. Optimisation of access to research data requires additional
attention from researchers, their communities and leaned societies as well as
from institutional research management and government policies.
As the quality and productivity of research is getting more and more dependent
on access to digital data, these policies should promote the use of existing and
future data sources in a more systematic way.
An example of a principal as well as practical
policy approach to the issues sketched was formulated by the US National Institutes
of Health (NIH) in its "Draft Statement on Sharing Research Data" from
March 1, 2002, see the information at http//grants.nih.gov/grants/policy/data_sharing/index.htm.The
Main Page gives access to an excellent (draft) "Data Sharing Workbook"
and a document listing FAQ's.
2.8. More quality and efficiency
To get more scientific quality out of data, data
should be accessible to be used by more researchers for differing purposes in
different projects, at different places over longer periods of time. To get a
higher return on investments, additional policies should also promote the efficiency
and productivity of research. As research data are largely publicly funded, better
access to data should heighten public return on public investments.
2.9 International co-ordination and co-operation
Intensive international co-ordination and co-operation
was needed for the establishment and operation of the larger research facilities
and the deployment of the larger human resources of the contemporary Global Science
System. Compared to the attention paid to investments in the infrastructures and
human capital, the interest in explicit policies for investment in and exploitation
of the extremely valuable 'floating capital' of scientific data has until now
often been limited. However, a fresh look at international co-ordination and co-operation
to optimise the production and use, the collection, processing, dissemination
and archiving seems even more challenging, being less complicated and holding
the promise of a better cost/benefit ratio.
3. The OECD/CSTP Working Group
3.1. Proposal
At the 76th The OECD/CSTP meeting of 13/14 March
2001 (ITEM 14, 32.), the CSTP
" ii) Agreed to the proposals presented by the Dutch and Polish Delegations
in Room Document No.6 concerning the follow-up to the Third Conference on the
Global Research Village, including
- the creation of a working group on access to research information, and
- the preparation of the Fourth Conference to be held in Poland in 2002."
(See for more information about the origins of the Working Group Appendix 1.
)
During the summer of 2001, experts and policy
makers from the United States, The Netherlands, Denmark, Poland and the European
Science Foundation, along with researchers from the US and The Netherlands,
undertook various activities to prepare the establishment of the Working Group
including further expert consultation and commissioning preliminary studies.
3.2. Preparations
The preparations for the activities of the Working
Groups were led by Peter Arzberger, Director, Life Sciences Initiatives at the
University of Califironia San Diego, (UCSD, Hugo Von Linstow, CSTP delegate from
Denmark and Peter Schröder, co-ordinator Information Policy, Ministry of
Education, Culture and Science in The Netherlands. Researcher Paul Wouters from
the Netherlands Institute for Scientific Information Services (NIWI) contributed
to the programming and execution of research. The CSTP nominated members from
Australia, Canada, Denmark, Finland, Japan, Poland and the United States, NSF,
ESF, CODATA participated in the activities. Discussions, consultations with experts
and preliminary results of desk research led to the conceptual "Points of
departure" that can be found in Appendix 2.
3.3. Focus on data
As a result of the talks with interested parties,
it was decided to focus on access to "data" instead of "research
information". It was felt that "Information" was too broad and/or
vague a concept in this context, that the issues of electronic publishing and
the "Serial Crisis" were treated satisfactorilyy elsewhere, and that
the important issues could be handled in a better way by focussing on research
data. The narrower focus was expected to make the process of reaching consensus
on principles, eventually to be used in a formal framwork, easier. Another result
of the consultation was the decision to include not only a policy study on the
subject, but a study of the scientific aspects on access to and sharing of data
as well.
3.4. Purpose of the Working Group
The international working group was established
to promote Access to and Sharing of Research Data. To this end it will:
- Report on current practices concerning Access to and Sharing of Research data
and their underlying principles on the basis of case studies;
- Report on effects of selected current data sharing practices on the quality
of research and the progress of science;
- Suggest principles for making policy on data sharing within the relevant national
and international policies and regulatory frameworks.
3.5. Product
The Working Group will publish a Report on data
access and sharing that will include a science policy section:
- describing existing arrangements of data access and sharing from selected cases
in various research disciplines, organisations and countries;
- analysing the formal and informal rules applicable in the arrangements;
- formulating a set of commonly agreed Principles derived from best practices
in these arrangements as well as the underlying normative values;
- coming up with policy recommendations to improve conditions to access and sharing
The Report will also include a scientific section
that will present the outcome of a separate research project on the practice and
trends in data sharing:
- presenting a social informatics perspective on the arrangements of access and
sharing;
- comparing various arrangements of access and sharing from this perspective;
- addressing issues of scientific standards, peer-review and quality control;
- coming up with conclusions on the positive and negative implications of data
access and sharing on the research process and the research system.
3.6. Scope
Anything imaginable can be used as data for scientific
research. The Working Group will limit its attention as much as possible to source
data (factual data; observations, measurements recorded in text as well as images
and sound) usable as input for research as distinct from bibliographical data.
As the project is directed primarily at governments, it will focus primarily on
data collected with public funds. These could be data collected for research,
but also data collected for other governmental purposes (census data, environmental
data from meteorological, geological and geographical surveys). As public funded
data can end up in data sets compiled by commercial firms, arrangements with market
parties are not to be excluded.
The Working Group will focus on arrangements of access and sharing according to
their current and future scientific and socio-economic significance.
3.7. Addressed parties
The Report of the Working Group will be addressed
to:
- (primarily) the governments of the OECD countries and other governments connected
in the global science system;
- organisations responsible for the funding of research
- research institutes
- professional scientific societies and the scientific community in general
3.8. Results
In formulating a set of Principles, the report
of the Working Group will contribute to a better understanding of the importance
of access to and sharing of data to the research process, the science system and
science policy by:
- raising the awareness of the relevant parties on the subject where needed;
- putting the issues more firmly on the relevant agendas and
- supporting the establishment of the necessary specific policies and regulation.
3.9. Organisation, Membership
Peter Arzberger chairs the Working Group, Peter
Schröder will act as vice-chairman. Researcher Kathleen Casey is secretary,
and heads the bureau of the Working Group in San Diego, Teri Simas is the administrative
co-ordinator ( simast@sdsc.edu ).
The Website for the Working Group will shortly be available at http://dataaccess.sdsc.edu/
A list of the current members and associates of the Working Group is included
in Appendix 3.
4. Preparations and Work Plan
4.1 Between April 2001 and March 2002 the following
activities were carried out:
* The consultation of experts and policy makers
from OECD delegations, the European Science Foundation, the European Commission,
the US National Science Foundation, the Steering Group of the 4th Global Research
Conference, CODATA and researchers at universities in the US and The Netherlands.
* Two quick elementary studies on the State of
Affairs by Paul Wouters from the Netherlands Institute for Scientific Information
Services (NIWI):
- a Quick Scan of the current relevant regulation on data sharing as formalised
and practised by a selection of research organisations in the United States (for
instance NSF, NIH, NRC and AAAS)
- a 'Mini Survey' (simple e-postal questionnaire) of the member organisations
of ESF, and similar organisations in Japan, Australia and Canada) to define the
issues in data sharing currently felt as most urgent in the other OECD countries.
Results to be published in March 2002.
* The drafting by US participants Peter Arzberger, Geof Bowker and Kathleen Case
(University of California, San Diego) of proposal accepted by the US National
Science Foundation (NSF) for a project combining scientific research and policy
research into data-sharing. The project is meant to function as the organisational
backbone of the Working Group. It will treat Access to and Sharing of Research
Data from the viewpoint of science policy and data management as well as from
a social informatics perspective. The Report shall be finalised in Spring 2003.
* A constitutional meeting held in Paris on October 17 2001. Peter Arzberger was
appointed chairman, at the suggestion of CSTP delegates the members were appointed
(see Appendix 3). Procedures, time schedule and Work Plan were discussed.
* Agreement on a longlist of relevant case studies covering the fields of:
- High-Energy Physics
- Astronomy
- Meteorological/Atmospheric research
- Structural Genomics
- Biodiversity and the Global Biodiversity Information Facility
- Epidemiology
- Social Sciences
- Cultural Heritage research
* Agreement on the following topics to be addressed in the case studies:
- the relevant parts of the different legal frameworks in the countries concerned,
- the business models currently employed and
- the status of (governmental) mass data producing laboratories and agencies like
meteorological, geological, topographical and environmental institutes, census
bureau, health services, cultural heritage collections, libraries)
* The start of a comprehensive study on the national legislation relevant to access
to and sharing of research data proposed (by Hans Franken, professor of Law at
Leiden University (and chairman of the third Global Research Village Conference)
accepted by the Netherlands National Research Programme "Information Technology
and the Law" (ITER) to be finalised in May of 2002.
* The start of two NIWI follow up studies:
- a study on the importance of Trust in the practice of smaal scale data sharing
- a study on data policies and management at the international 'Big Science' organisations
CERN (European Organisation for Nuclear Research) and EMBL (European Molecular
Biology Laboratory).
Studies to be finished by the end of 2002.
* The preparation of an expert meeting on the economics and management of digital
research data for the Global Science System. Probably to be organised by Luc Soete
and to be held at the Maastricht Research Institute on Innovation and Technology
(MERIT) in September 2002.
4.2. From March 2002 onwards the planning is
as follows:
- The activities of the Working group will take
place between October 2001 and the summer of 2003. The first progress report to
the CSTP will be made at the 78th meeting to be held in Paris at 19 and 20 March
2002.
- An interim Report will be finalised in September 2002 to be presented at the
fourth Global Research Village Conference to be held in Poland (10-11 October
2002).
- The interim Report, or parts of it, will be presented also at the 18th CODATA
Conference to be held in Montreal between 29 September and 3 October 2002 .
- A progress report and the interim Report
will be presented to the CSTP at its 79th meeting to be held in Paris at in October
2002.
- Various results of the activities of the
Working Group will be presented at the Society for Social Studies of Science Conference
in November 2002.
- The final version of the Report will be presented to CSTP at its 80th meeting
in March 2003 in Paris.
- Input for the Report will come from its members and their organisations. The
Bowker/Arzberger NSF project will structure the activities as researcher Casey
will act as secretary for the Working group as well and edit the report.
- The Bowker/Arzberger research project will continue after the publishing of
the final Report and will be concluded at the end of 2003.The
bulk of the activities of the Working group will be conducted by e-mail and tele-conferencing
supported by the secretariat to be located in San Diego. The progress of the activities
will be published on the Website of the Working Group.
The Working Group will meet in person:
- in Paris on 18 and 19
- March (before the March meeting of OECD/CSTP)
- in June 2002 in San Diego to review the work and finalise the interim Report
(Meetings will be organised to coincide as much as possible with these of the
GRV 4 Steering Group)
Members of the Working Group will also meet
- at the 4th Global Research Village Conference
(10-11 October 2002)
5. Preliminary results and current research
and fact finding
I. NIWI Mini Survey
II. NIWI Quick Scan on US Regulation
III. San Diego study
IV. ITER study on legislative framework
V. NIWI study on trust and sharing
VI. NIWI report on CERN and EMBL practices
VII. MERIT expert meeting on data economics
5.1. The NIWI studies
I. 'Mini Survey' among member of the European Science
Foundation, Australia, Canada and Japan.
II. Quick Scan of the current relevant regulation in the US.
5.1.2. Two complementary studies
Two quick preliminary studies on the state of affairs
was undertaken (between July 15th and November 15th 2001) by NIWI (Netherlands
Institute for Scientific Information Services) under the supervision of Dr. Paul
Wouters. It included a Quick Scan of the current relevant regulation in the US.
and a 'Mini Survey' to help to define the issues in data sharing currently felt
as most urgent in the other OECD countries.
The results will be published in March 2002.
5.1.3. US Regulation: legal framework
Summarising the conclusions from the Quick Scan
of US regulation, it is safe to state that all the relevant important research
organisations have enacted rather elaborate regulation on Access to and Sharing
of Research Data. This regulation is permeated by a sense of public accountability
and accessibility derived from the Freedom of Information Act and OMB circular
A-130 on Federal information policy (as codified by the Paperwork Reduction Act).
In this way, the existing regulation illustrates the importance of the availability
and the constraint of suitable national legislation on Access and Sharing of Research
Data.
Another aspect of the regulation seems to be a tendency to protect investment
by researchers as primary collectors of data against claims from outside the research
community, particularly through periods of exclusive use of data by Principal
Investigators prior to the publication of results. A systematic policy towards
the building of a common, public data infrastructure seems absent outside the
specialised "Big Science" organisations.
5.1.4. Other OECD countries: growing awareness
Preliminary findings from the Mini Survey among
ESF member organisations showed that policies on Access to and Sharing of Research
Data are far from being as common as in the US. Still most of the respondents
expected that Access and Sharing would develop into an important policy issue
in all scientific fields in the near future. The main problems expected by the
respondents were technical problems of interoperability, descriptive standards
and institutional barriers. Personal attitude from researchers and aspects of
ownership were also mentioned. In contrast to the content of the US regulation,
commercialisation of data did not seem to represent a major issue.
5.1.5. The start of something
At the international level, data-sharing still
seems to be in its infancy as a policy issue
But providing access to scientific data is fast becoming a crucial aspect of science
policy at the national and international level (National Research Council 1997).
The need for increased levels of data processing are related to a number of developments:
the application of information and communication technologies (ICT) in research;
the development of new, often interdisciplinary, research questions; and the increased
social and economic role of science, social science and the humanities. At the
same time, a prudent use of state of the art information and communication technologies
may help create new methods of providing access to scientific data in a timely
and cost-effective way on a truly global scale.
5.2. The San Diego Study
III. Access to and Sharing of Data from Public
Funding
A proposal to NSF was made for a project combining research into data-sharing
from a social informatics perspective project and policy research to support the
activities of the Working Group was drafted by Dr. Peter Arzberger and Prof. Geoffrey
Bowker from the University of California, San Diego. This has been accepted and
the work will continue during the course of 2002. From the proposal to NSF:
5.2.1. Policy issues
Our report will build on the results of the third
Global Research Village Conference as well as on the "Bits of Power"
report (NRC 1997), but expand the disciplinary range considered (to include the
social and medical sciences) in order to account for the new disciplinary and
interdisciplinary formations that are coming to the fore - each with their own
sets of data practices. Further it will address the issues at the levels of science
policy and research management and will have the advantage of six more years of
information in this digital age (beyond the Bits of Power study). It will have
concrete examples to motivate principles, more so than the CODATA report. In addition,
it will build on the principles for access to and sharing of publicly funded research,
laid out at the conclusion of the Global Research Village III conference, and
will consider implications of implementation of those principles. It will attempt
to capture, as a snap-shot, current practices and policies being employed across
disciplines, funding agencies and countries.
5.2.2. Social Informatics
Finally, it is our intent to put this work on a
strong (social informatics) research footing. A technical fix to the problem of
data sharing will not work without strong organizational support. Consider, for
example, the widespread non-adoption of the Worm Community System developed to
support data sharing in the community of researchers mapping c-elegans, and intended
to provide an easy interface with other genome mapping communities. The excellence
of the technology could not overcome the desire for privacy, confidentiality and
proprietary use amongst many of the researchers involved (especially postgraduates,
for whom their engagement with the mapping effort could make or break their careers)
and in general without an organizational understanding of the development of scientific
careers as they affected data sharing issues (Star and Ruhleder, 1996). In order
to develop robust long term information infrastructures, we must combine technical
developments and organizational innovation (Bowker and Star, 1999).
5.2.3. Data economies
In order to generate the clusters that will be
most useful for the committee, we will organize them along two specific dimensions,
each of which will be discussed in detail in the report: international data sharing
arrangements, emerging "data economies" including issues of justice
and fair use. We discuss these briefly below in order to illustrate the core research
issues each dimension poses. We feel that this focusing is important to ensure
appropriate scientific rigor in the final report, but more importantly it will
provide a clear link to a wider community of researchers, who will be able to
continue on the work produced here. We hope to provide a platform for ongoing
dialog between the research community and the policy community as technology evolves
faster than policy. Furthermore, we will be reaching out to that research community
as a result of our effort, and doing so in national and international fora.
5.3. Legislative framework
IV. ITER study
A comprehensive study on the national legislation relevant to access to and sharing
of research data, its purposes and underlying legal principles, relevant international
treaties, additional regulation and relevant jurisprudence. The proposal was drafted
by Hans Franken, professor of Law at Leiden University (and chairman of the third
Global Research Village Conference).The project was accepted by the Netherlands
National Research Programme "Information Technology and the Law" (ITER)
and will be carried out during the Spring of 2002.
5.4. NIWI follow up studies
V. Study on the role of Trust (Anne Beaulieu)
VI. Study on practices at CERN and EMBL (Colin Reddy)
5.4.1. Formal regulation and practice
Given the relevance of clear policy principles,
the next question is how they compare with actual data-sharing practices. This
is the topic of a set of case studies which are now being undertaken. A number
of actors are crucial in the practice of data-sharing: funding agencies, data
repositories and archives, dedicated Web sites with data, and not least the researchers
themselves. Their interaction determines to what extent data are actually being
shared among researchers and between researchers and non-expert audiences.
5.4.2. Effects of informal behaviour
The case studies aim to draw lessons from present
data-sharing practices, illustrate the issues that are most pressing, locate best
practices and exemplary models, find out which additional policies or funding
mechanisms may be needed, and identify the main barriers and obstacles for heightened
data-sharing. Which types of tools and regulation are most conducive to data-sharing,
and which effects increased data-sharing may have on the research process, will
also be addressed in the case studies. One can expect that these effects will
vary by scientific field and probably also by the type of data involved. Data-sharing
is not always uncontroversial in the scientific community. In some specialties,
the duty to make research data publicly available seems to clash with established
traditions and routines (or lack thereof). This raises the additional question
of the transaction costs.
5.4.3. The perspective of the researcher
Moreover, the application of general principles
for data-sharing in research contract conditions requires specialist knowledge
of the types of data involved and of the various stages in the research process.
This is usually acquired in some form of cooperation or communication with the
researchers in question. In other words, the application of the general principles
and guidelines is based on, and produces, configurations of trust relationships
and practical provisions. Data-sharing is not only a technical issue but a complex
social process in which researchers have to balance different pressures and tensions.
Basically, two different modes of data-sharing can be distinguished: peer-to-peer
forms of data-sharing and repository-based data-sharing.
5.4.4. Personal and formal relations
In the first mode, researchers communicate directly
with each other. In the second mode, there is a distance between the supplier
of data and the user in which the rules of the specific data repository determine
the conditions of data-sharing. In both modes, the existence or lack of trust
between the data supplier and the data user is crucial, though in different configurations.
One of the case studies focuses on the systematic study of these configurations
of trust relationships in data-sharing. The other case studies will result in
best practice models for data-sharing.
Together with the study of economic and legal aspects of data-sharing they will
hopefully provide us with more knowledge about the basic social mechanisms shaping
the access to and sharing of research data and help identify the most important
barriers to an increased level of use of existing scientific knowledge and data.
5.4.5. Data sharing in Big Science
In a more formal way, this study will analyse regulation
and practice at two international 'Big Science' organisations with a checklist
that includes the following aspects:
- Regulation on use of data in the relevant treaties and mission statements
- Formal responsibilities for the collection, processing, use and archiving of
data
- Ownership of data, rights of disposal
- Documentation, technical standards
- Quality control, review and data security
- Availability and dissemination
5.5. Expert meeting on the economical and managerial
context
VIII. MERIT (Luc Soete)
The growing importance of digital research data
as the floating capital of Global Science: Expert meeting on the economics and
management of digital research data for the Global Science System. The meeting
will address issues connected with the following questions:
5.5.1. Data as an category in economics and
bookkeeping
Use of ICT has led to a spectacular increase in
the collection and use, input and output of digital data in research. The traditional
framework of policies and management (and bookkeeping) pays attention to categories
as for instance workforce, facilities and output in research but does not yet
take into account data use as a relevant distinct component. Do the dramatic increases
in scale and scope of data supply require additional policies on the funding,
managing, dissemination and archiving of research data as a distinct category?
5.5.2. Making the use of digital research data
visible
Use of digital research data is spectacularly growing
and there are figures on the increase of data-related activities in (some disciplines
in) research. But it is hard to show something of an accompanying rise in expenditure
on data as long as costs (and benefits) are not visible in research budgets.
What empirical evidence is currently available on developments in data expenditure?
How should expenditure on data be accounted for in research budgets: incidental
costs?, investments in data infrastructure? Output?
5.5.3. Economic models for data management
Demand as well as supply of research data is usually
in the hands of different research institutes, public services and private firms.
Costs for appropriate data supply (from data collection to data archiving) should
be accounted for in the budgets of funding organisations and research institutes.
Cost could include value adding / Access services from private partners ([database]
publishers, software makers).
What business models are currently used, what alternatives are feasible?
5.5.4. Research data as Public Good / Research
data as proprietary information
Adequate policies on the use of research data require
international co-operation and co-ordination between different countries, institutes
and private firms to insure conditions for a 'free flow' of relevant data and
information. These policies call for a (re)consideration of the current (inter)national
legal and regulatory frameworks relevant to openness, access and property rights.
What are the (dis)advantages (efficiency, return on investment) of public good
and proprietary regimes to insure optimum use of research data? (efficiency of
public good regimes, monopolies, niche markets)
What part should there be for Intellectual Property Rights? (What use should researchers
make of IP rights?)
Appendix 1.:
Origin of the Working Group
1. OECD Global Research Village Conferences
After conferences in Denmark (1996) and Portugal (1998), the third OECD Global
Research Village Conference was held on 6, 7 and 8 December in Amsterdam. The
conference was organised jointly by the OECD Committee for Scientific and Technological
Policy (CSTP) and the Netherlands Ministry of Education, Culture and Science.
The conference was opened by Minister Hermans of the Netherlands, OECD Secretary-General
Johnston and EU-Commissioner Busquin. Minister Wiszniewski of Poland closed the
conference and offered to host the next conference in Poland in 2002. 94 representatives
from governments, the European Union, research institutes and research organisations
from 20 countries attended the conference.
2. ICT and Access to the Global Science System
Like its predecessors, the conference addressed the policy implications of the
use of Information and Communication Technologies (ICT) for the science system.
This time the conference focussed on issues of "Access to publicly financed
research". To give two examples, the science policy aspects of developing
ICT infrastructures like Next Generation Internet and the GRID and the accessibility
of information from the databases of Human Genome projects were addressed.
Technical (ICT infrastructures) and regulatory (legislation on IPR , copyright
and privacy) aspects of accessing publications, data and other resources from
publicly financed research by scientists, industry and the public were discussed.
3. Practices and Principles
The Conference Recommendations invited the Conference
Steering Group to further elaborate on the questions discussed. The Steering Group
took up the suggestion and proposed a Working Group of experts (from OECD countries
with involvement from the European Union, the European Science Foundation (ESF),
and the US National Science Foundation (NSF) to be established with the purpose
to:
- report on current practices and their underlying principles and
- make policy suggestions about options of further implementation of these principles,
concerning Access to publicly financed research information to the CSTP at the
next Global Research Village Conference to be held in Poland in 2002.
By principles are meant the general normative (legal, ethical, political and economical)
fundamentals relevant to Access to and Sharing of Research Data. These principles
are to be used as the groundwork for more specific, practical regulation in guidelines.
(Examples of successful guidelines based on a systematic set of principles are
The OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal
Data (1980), or the Principles and Guidelines for the Sharing of Biomedical Research
Resources (1999) of the US National Institutes of Health (NIH)).
Appendix 2:
Points of departure
1. Generalities and specifics
Discussions, consultations with experts and preliminary
results of desk research have led to the following conceptual points of departure.
These points of departure represent statements on the wide ranging, heterogeneous
field of scientific research in general. Although the meaning of various aspects
of data management may differ depending on the disciplinary, institutional or
national context, (e.g. the protection of personal data of citizens compared to
the protection of meteorological data for reasons of national security) similarities
will prevail. Comparing different practices from different fields can lead to
models that are successfully applicable elsewhere. On the level of international
science policy a general approach should be useful.
2. Sources for research
In scientific research data on the world we live
in are processed in order to test hypotheses concerning that same world. Data-sets,
systematic collections of numerical scores, textual records, images and sounds,
are used as sources for scientific research. Data processing can be seen as the
administrative (virtual) paperwork of research. Data are the lifeless records
of phenomena concerning living and inanimate physical beings and objects around
us. Of course processing of identifiable personal data (ICT is pushing hard at
the limits of identification!) can have enormous consequences in real life and
should be subject of regulation in conformity with national legislation.
3. Data as distinct part of research
Generally speaking, data can be treated, conceptually
as well as practically, as relatively autonomous elements in the research process.
Data collection and data management usually can be treated as separate activities
and functions in research, potentially done by specialists. Data used in research
are not necessarily collected by researchers: extensive data sets from governmental
organisations (Census, health services, meteorological, geological institutes)
and commercial firms (market research, geographical information) are frequently
used for scientific research.
Cost for data constitute a distinct part of the research budget.
4. Ownership, rights of access and disposal
Ownership of data used in research is not always
clearly established in the relevant documents. Formally, most of the public funded
research data are in the public domain, accessible to all (data as such do not
qualify as intellectual property so once published, all data are in the public
domain). In everyday research life, researchers may tend to act as owners of the
data used in their projects. The responsibility for sustainable archiving of research
data is not always assigned to the relevant parties. Lack of regulation on these
aspects may hamper Access to and sharing of research data.
5. Multiple use
Use of the same data for multiple research (researchers,
projects, institutes) should be considered beneficial to the quality and productivity
of research.
Sharing data will widen the scale and scope, and enhance the quality and quantity
of data resources.
Sharing data will improve the cost-benefit ratio of data collection and data management
and will lead a higher return on investments in data.
6. ICT widens the scale and scope of data-sharing
Thanks to the use of ICT facilities, an ever-increasing
amount of data, ranging from very specific to general purpose data, should be
considered as potential sources for multiple research on a global scale. By the
ICT-mediated sharing of data, the same sources can be of use for different research
problems, projects, institutes and disciplines, at different places, at different
times.
7. Effects of data-sharing on the research
process
Increased ICT mediated sharing of data for research
requires agreement on standards for quality control and scientific and technical
interoperability. This will mean a force towards further homogenisation of research
methods, techniques and paradigms in increasingly data driven research. Diminishing
diversity in research perspectives and reinforcement of the Matthew Effect (Robert
K. Merton; the referencing effect by which attribution of scientific merit and
reputation tends to accumulate with ever fewer celebrities) can have negative
effects on the progress of science.
8. Co-ordination in data management required
In the current research practice, better use could
be made of the increased possibilities of national international data sharing.
Existing data sets lend themselves to more extensive use by more researchers but
collaborative arrangements of exploitation and archiving are needed to realise
this potential. Co-ordination in the collection and exploitation of additional
data is needed to provide a supply that meets the future scientific demand in
better ways.
9. Co-ordination requires policies
More extensive and better use of research data
requires more explicit policies to promote global access to and sharing of data
from governments, research funding organisations, research institutes and professional
organisations. These national and international policies should create the conditions
of openness that make sharing of data attractive and valuable to the individuals
and organisations concerned.
10. Expensive facilities
In almost all scientific fields the frontiers have
been pushed by the use of (large) facilities (new 'computerised' instruments connected
by network infrastructures) taking care of the digitisation of the data supply.
These facilities can be so extremely expensive that international co-operation
is required to establish them. Once established, management, processing, distribution
and sustainable archiving of the data generated will often remain as expensive
as to require continuing international co-operation.
11. Interoperability
National and international policies for access
to and sharing of research data should address the relevant technical and scientific
standards concerning quality and interoperability required for co-operative arrangements.
They should take into account the positive as well as the negative effects of
interoperability on the diversity of paradigms in the global research system.
12. Economic, legal and regulatory aspects
National and international policies for access to and sharing of research data
should address the relevant issues of investment and ownership of data, rights
of disposal (intellectual property rights of databases and its (retrieval) software)
and, as far as relevant, protection of individual privacy and national security.
Appendix 3 :
Members of the OECD follow-up group on issues
of access to data
Mr. Peter Arzberger (Chair)
Director, Life Sciences Initiatives at the University of Califironia San Diego
(UCSD)
University of California, San Diego
parzberg@ucsd.edu
Bureau of the Working Group University of California, San Diego
Ms. Kathleen Casey, Secretary, kcasey@ucsd.edu
Ms. Teri Simas, Administrative Coordinator, simast@sdsc.edu
Mr. Geoffrey Bowker, Professor
Department of Communications
University of California, San Diego
bowker@ucsd.edu
Mr. Koji Kamitani
Office of IT Promotion Research Promotion Bureau
MEXT (Ministry of Education, Culture, Sports, Science and Technology)
kami@mext.go.jp
Mr. Leif Laaksonen
Network Information Services Group (NISG)
Center for Scientific Computing
Leif.Laaksonen@csc.fi
Ms. Gudrun Maass
Principal Administrator
Science and Technology Policy Division, DSTI
OECD
gudrun.maass@oecd.org
Mr. Doug McEachern
Director Social, Behavioural and Economic Sciences
Australian Research Council
doug.mceachern@arc.gov.au
Mr. David Moorman
Policy analyst, Policy and Liaison Branch
Social Sciences and Humanities Research Council
david.moorman@sshrc.ca
Mr. Masamitsu Negishi
Professor, Humanities and Social Science Information Research
Research Information Research Division / Director of Research Division
National Institute of Informatics
negishi@nii.ac.jp
Mr. Peter Schröder (vice-chair)
Co-ordinator Information Policy
Ministry of Education, Culture and Science
Directorate Research and Science Policy
p.schroeder@minocw.nl
Mr. Paul Uhlir
Director, International Scientific and Technical Information Programs
U.S. National Academy of Sciences/National Research Council
puhlir@nas.edu
Mr. Mitsutoshi Wada
Manager Office of Science and Technology Information
Japan Science and Technology Corporation (JST)
wada@tokyo.jst.go.jp
Mr. Andrzej P. Wierzbicki
Director,
National Institute of Telecommunications
a.wierzbicki@itl.waw.pl
Mr. Jan Windmueller
Head of Section
Ministry of Information Technology and Research
jwi@fsk.dk
Researchers, Experts, other Relations
Ms. Anne Beaulieu
Networked Research and Digital Information (Nerdi)
NIWI-KNAW
F 3120 6658013
http://www.niwi.knaw.nl/nerdi
anne.beaulieu@niwi.knaw.nl
Ms. Kathleen Casey
University of California, San Diego
Department of Communication
kcasey@ucsd.edu
Mr. Paul Wouters
NIWI Research
paul.wouters@niwi.knaw.nl
Mr. David Schindel
Head, Europe Office
National Science Foundation
dschinde@nsf.gov
Mr. Tony Mayer
Head Secretary - General's Office
European Science Foundation
amayer@esf.org