The Public Domain of Digital
Research Data
OECD Follow-up Group on Issues
of Access to Publicly Funded Research Data
Planning Document
1. Background
1.1. OECD Global Research Village Conferences
After conferences in Denmark (1996) and Portugal (1998), the third OECD Global
Research Village Conference was held on 6, 7 and 8 December in Amsterdam. The
conference was organised jointly by the OECD Committee for Scientific and Technological
Policy (CSTP) and the Netherlands Ministry of Education, Culture and Science.
The conference was opened by Minister Hermans of the Netherlands, OECD Secretary-General
Johnston and EU-Commissioner Busquin. Minister Wiszniewski of Poland closed
the conference and offered to host the next conference in Poland in 2002.
94 representatives from governments, the European Union, research institutes
and research organisations from 20 countries attended the conference.
1.2. ICT and Access to the global science system
Like its predecessors, the conference addressed the policy implications of the
use of Information and Communication Technologies (ICT) for the science system.
This time the conference focussed on issues of "Access to publicly financed
research".
To give two examples, the science policy aspects of developing ICT infrastructures
like Next Generation Internet and the GRID and the accessibility of information
from the databases of Human Genome projects were addressed.
Technical (ICT infrastructures) and regulatory (legislation on IPR , copyright
and privacy) aspects of accessing publications, data and other resources from
publicly financed research by scientists, industry and the public were discussed.
1.3. Practices and Principles
The Conference Recommendations invited the Conference Steering Group to further
elaborate on the questions discussed. The Steering Group took up the suggestion
and proposed a Working Group of experts (from OECD countries with involvement
from the European Union, the European Science Foundation (ESF), and the US National
Science Foundation (NSF) to be established with the purpose to:
- report on current practices and their underlying principles and
- make policy suggestions about options of further implementation of these principles,
concerning Access to publicly financed research information to the CSTP at the
next Global Research Village Conference to be held in Poland in 2002.
By principles are meant the general normative (legal, ethical, political and
economical) fundamentals relevant to Access to and Sharing of Research Data.
These principles are to be used as the groundwork for more specific, practical
regulation in guidelines. (Examples of successful guidelines based on a systematic
set of principles are The OECD Guidelines on the Protection of Privacy and Transborder
Flows of Personal Data (1980), or the Principles and Guidelines for the Sharing
of Biomedical Research Resources (1999) of the US National Institutes of Health
(NIH)).
1.4. OECD/CSTP Working Group
At the 76th The OECD/CSTP meeting of 13/14 March 2001 (ITEM 14, 32.), the CSTP
" ii) Agreed to the proposals presented by the Dutch and Polish Delegations
in Room Document No.6 concerning the follow-up to the Third Conference on the
Global Research Village, including
- the creation of a working group on access to research, and
- the preparation of the Fourth Conference to be held in Poland in 2002."
During the summer of 2001, experts and policy makers from the United States,
The Netherlands, Denmark, Poland and the European Science Foundation, along
with researchers from the US and The Netherlands, undertook various activities
to prepare the establishment of the Working Group including further expert consultation
and commissioning preliminary studies.
1.5. Preparations
As a result of the talks with interested parties, it was decided to focus on
access to "data" instead of "research information". It was
felt that "Information" was too broad and/or vague a concept in this
context, that the issues of electronic publishing and the "Serial Crisis"
were treated satisfactory elsewhere, and that the important issues could be
handled in a better way by focussing on data. The narrower focus was expected
to make the process of reaching possible consensus on principles, eventually
to be used in regulation, easier. Another result of the consultation was the
decision to include not only a policy study on the subject, but a study of the
scientific aspects on access to and sharing of data as well.
1.6. Preliminary studies
- A quick preliminary study on the State of affairs was undertaken (between
July 15th and September 15th 2001) by NIWI (Netherlands Institute for Scientific
Information Services) under the supervision of Dr. Paul Wouters. It included
a Quick Scan of the current relevant regulation in the US. and a 'Mini Survey'
to help to define the issues in data sharing currently felt as most urgent in
the other OECD countries.
- A proposal to NSF was made for a project
combining research into data-sharing from a social informatics perspective project
and policy research to support the activities of the Working Group was drafted
(submitted August 12, 2001***) by Prof. Peter Arzberger and Prof. Geof Bowker
from the University of California, San Diego.
- A proposal was made by professor Prof. Hans
Franken to the Netherlands research programme Information Technology and the
Law for a comprehensive national survey on the legislation and regulation relevant
to access to and sharing of research data. It was accepted and will be carried
out during the first months of 2002.
2. Points of departure
Discussions, consultations with experts and preliminary results of desk research
have led to the following conceptual points of departure. These points of departure
represent statements on the wide ranging, heterogeneous field of scientific
research in general.
Although the meaning of various aspects of data management may differ depending
on the disciplinary, institutional or national context, (e.g. the protection
of personal data of citizens compared to the protection of meteorological data
for reasons of national security) similarities will prevail. Comparing different
practices from different fields can lead to models that are successfully applicable
elsewhere. On the level of international science policy a general approach should
be useful.
2.1. Sources for research
In scientific research data on the world we live in are processed in order to
test hypotheses concerning that same world. Data-sets, systematic collections
of numerical scores, textual records, images and sounds, are used as sources
for scientific research.
Data processing can be seen as the administrative (virtual) paperwork of research.
Data are the lifeless records of phenomena concerning living and inanimate physical
beings and objects around us. Of course processing of identifiable personal
data (ICT is pushing hard at the limits of identification!) can have enormous
consequences in real life and should be subject of regulation in conformity
with national legislation.
2.2. Data as separate part of research
Generally speaking, data can be treated, conceptually as well as practically,
as relatively autonomous elements in the research process. Data collection and
data management usually can be treated as separate activities and functions
in research, potentially done by specialists.
Data used in research are not necessarily collected by researchers: extensive
data sets from governmental organisations (Census, health services, meteorological,
geological institutes) and commercial firms (market research, geographical information)
are frequently used for scientific research.
Cost for data constitute a distinct part of the research budget.
2.3. Ownership, rights of disposal
Ownership of data used in research is not always clearly established in the
relevant documents. Formally, most of the public funded research data are in
the public domain, accessible to all (data as such do not qualify as intellectual
property so once published, all data are in the public domain). In everyday
research life, researchers may tend to act as owners of the data used in their
projects. The responsibility for sustainable archiving of research data is not
always assigned to the relevant parties. Lack of regulation on these aspects
may hamper Access to and sharing of research data.
2.4. Multiple use
Use of the same data for multiple research (researchers, projects, institutes)
should be considered beneficial to the quality and productivity of research.
Sharing data will widen the scale and scope, and enhance the quality and quantity
of data resources.
Sharing data will improve the cost-benefit ratio of data collection and data
management and will lead a higher return on investments in data.
2.5. ICT widens the scale and scope of data-sharing
Thanks to the use of ICT facilities, an ever-increasing amount of data, ranging
from very specific to general purpose data, should be considered as potential
sources for multiple research on a global scale. By the ICT-mediated sharing
of data, the same sources can be of use for different research problems, projects,
institutes and disciplines, at different places, at different times.
2.6. Effects of data-sharing on the research
process
Increased ICT mediated sharing of data for research requires agreement on standards
for quality control and scientific and technical interoperability. This will
mean a force towards further homogenisation of research methods, techniques
and paradigms in increasingly data driven research. Diminishing diversity in
research perspectives and reinforcement of the Matthew Effect (Robert K. Merton)
can have negative effects on the progress of science.
2.7. Co-ordination in data management required
In the current research practice, better use could be made of the increased
possibilities of national international data sharing. Existing data sets lend
themselves to more extensive use by more researchers but collaborative arrangements
of exploitation and archiving are needed to realise this potential. Co-ordination
in the collection and exploitation of additional data is needed to provide a
supply that meets the future scientific demand in better ways.
2.8. Co-ordination requires policies
More extensive and better use of research data requires more explicit policies
to promote global access to and sharing of data from governments, research funding
organisations, research institutes and professional organisations. These national
and international policies should create the conditions of openness that make
sharing of data attractive and valuable to the individuals and organisations
concerned.
2.9. Expensive facilities
In almost all scientific fields the frontiers have been pushed by the use of
(large) facilities (new 'computerised' instruments connected by network infrastructures)
taking care of the digitisation of the data supply. These facilities can be
so extremely expensive that international co-operation is required to establish
them. Once established, management, processing, distribution and sustainable
archiving of the data generated will often remain as expensive as to require
continuing international co-operation.
2.10. Interoperability
National and international policies for access to and sharing of research data
should address the relevant technical and scientific standards concerning quality
and interoperability required for co-operative arrangements.
They should take into account the positive as well as the negative effects of
interoperability on the diversity of paradigms in the global research system.
2.11. Economic, legal and regulatory aspects
National and international policies for access to and sharing of research data
should address the relevant issues of investment and ownership of data, rights
of disposal (intellectual property rights of databases and its (retrieval) software)
and, as far as relevant, protection of individual privacy and national security.
-----
3. The OECD/CSTP Working Group
3.1. Purpose
The international working group is established to promote Access to and Sharing
of Research Data. To this end it will:
- Report on current practices concerning Access
to and Sharing of Research data and their underlying principles on the basis
of case studies;
- Report on effects of selected current data
sharing practices on the quality of research and the progress of science;
- Suggest principles for making policy on data
sharing within the relevant national and international policies and regulatory
frameworks.
3.2. Participation
Prof. Peter Arzberger, executive director of the US National Partnership for
Advanced Computational Infrastructure participated in the preparatory activities
together with Geof Bowker, Professor of Communication and Science Studies at
UCSD. Arzberger will chair the Working Group. Arzberger and Bowker together
with Kathleen Casey (UCSD) have made a proposal to NSF *** for the Working Group
as a combined policy/research project. Dr. Paul Uhlir, from CODATA, Director
of International Scientific Information Programs at the US National Academy
of Sciences/National Research Council, will participate. Peter Schröder
(co-ordinator Information Policy, Ministry of Education, Culture and Science,
The Netherlands), Hugo Von Linstow, (Advisor Analyses and Strategies, Ministry
of Research and Information Technology, Denmark) and Tony Mayer, Head ESF Secretary
General's Office, will continue their commitment.
From the GRV 4 Steering Group, Prof. Andrzej Wierzbicki, Director National Institute
of Telecommunication, Poland, will participate. Prof. Doug Mc. Eachern, Director
Social and Behavioural Science with the Australian Research Council, has accepted
the invitation to participate in the Working Group, Canada has expressed willingness
to participate. Other countries from Asia and Europe/EU will be asked to nominate
participants. Further collaboration with ESF and European is to be expected.
3.3. Product
The Working Group will publish a Report on data access and sharing that will
include a science policy section:
- describing existing arrangements of data access and sharing from selected
cases in various research disciplines, organisations and countries;
- analysing the formal and informal rules applicable in the arrangements;
- formulating a set of commonly agreed Principles derived from best practices
in these arrangements as well as the underlying normative values;
- coming up with policy recommendations to improve conditions to access and
sharing
The Report will also include a scientific section
that will present the outcome of a separate research project on the practice
and trends in data sharing:
- presenting a social informatics perspective on the arrangements of access
and sharing;
- comparing various arrangements of access and sharing from this perspective;
- addressing issues of scientific standards, peer-review and quality control;
- coming up with conclusions on the positive and negative implications of data
access and sharing on the research process and the research system.
3.4. Scope
Anything imaginable can be used as data for scientific research. The Working
Group will limit its attention as much as possible to source data as distinct
from bibliographical data. As the project is directed primarily at governments,
it will focus primarily on data collected with public funds. These could be
data collected for research, but also data collected for other governmental
purposes (census data, meteorological data from meteorological, geological and
geographical surveys). As public funded data can end up in data sets compiled
by commercial firms, arrangements with market parties are not to be excluded.
The Working Group will focus on arrangements of access and sharing according
to their current and future scientific and socio-economic significance. The
outcome of the preliminary 'mini survey' should help to give indications on
this.
3.5. Addressed parties
The Report of the Working Group will be addressed to:
- (primarily) the governments of the OECD countries and other governments connected
in the global science system;
- organisations responsible for the funding of research
- research institutes
- professional scientific societies
3.6. Results
In formulating a set of Principles, the report of the Working Group will contribute
to a better understanding of the importance of access to and sharing of data
to the research process, the science system and science policy by:
- raising the awareness of the relevant parties on the subject where needed;
- putting the issues more firmly on the relevant agendas and
- supporting the establishment of the necessary specific policies and regulation.
4. Working Plan
4.1. Preparations
Between April and October 2001 the following preparatory activities (see also
1.6.) were carried out:
* Consultation of experts and policy makers
from OECD delegations, the European Science Foundation, the European Commission,
the US National Science Foundation, the Steering Group of the 4th Global Research
Conference and researchers at universities in the US and The Netherlands. Preliminary
contacts were made with CODATA/ICSU, OECD and delegates from Japan and Canada.
* A quick elementary survey on the State of
affairs was held by Paul Wouters from The Netherlands Institute for Scientific
Information Services (NIWI). This study consisted of:
- a Quick Scan of the current relevant regulation on data sharing as formalised
and practised by a selection of research organisations in the United States
(for instance NSF, NIH, NRC and AAAS)
- a 'Mini Survey' (simple e-postal questionnaire) of the member organisations
of ESF, and similar organisations in Japan, Australia and Canada) to define
the issues in data sharing currently felt as most urgent in the other OECD countries.
At the time of writing, the first results of these studies are coming in. In
November 2001 the final findings will be available.
* US particpants Peter Arzberger, Geof Bowker
and Kathleen Case (University of California, San Diego) made a proposal to the
US National Science Foundation for a project combining scientific research and
policy research into data-sharing. The project could function as the organisational
backbone of the Working Group. It will treat Access to and Sharing of Research
Data from the viewpoint of science policy and data management as well as from
a social informatics perspective.
A decision of NSF is expected on short notice.
* Hans Franken, professor of Law at Leiden
University (and chairman of the third Global Research Village Conference) made
a proposal for a comprehensive international survey on the legislation relevant
to access to and sharing of research data, its purposes and underlying legal
principles, the relevant international treaties, additional regulation and relevant
jurisprudence.
The project was accepted by the Netherlands National Research Programme "Information
Technology and the Law" and will be carried out during the first months
of 2002.
4.2. Working Plan: content
Quick Scan and Mini Survey
At the time of writing, the results from the two preparatory studies from NIWI
were not yet complete. Anticipating the conclusions from the Quick Scan of US
regulation, it is safe to state that all the relevant important organisations
have enacted rather elaborate regulation on Access to and Sharing of Research
Data. This regulation is permeated by a sense of public accountability and accessibility
that largely seems to be derived from the Freedom of Information Act and the
Bayh Dole Act. In this way, the existing regulation illustrates the importance
of the availability and the constraint of suitable national legislation on Access
and Sharing of Research Data.
Another aspect of the regulation seems to be a tendency to protect investment
by researchers as primary collectors of data against claims from outside the
research community. A systematic policy towards the building of a common, public
data infrastructure seems absent outside the specialised Big Science organisations.
Preliminary findings from the Mini Survey among ESF member organisations showed
that policies on Access to and Sharing of Research Data are far from being as
common as in the US. Still most of the respondent expected that Access and Sharing
would develop into an important policy issue in all scientific fields in the
near future. The main problems expected by the respondents were technical problems
of interoperability, descriptive standards and institutional barriers. Personal
attitude from researchers and aspects of ownership were also mentioned. In contrast
to the content of the US regulation, commercialisation of data did not seem
to represent a major issue.
The Working Group will direct its activities
at further elaboration of the most important leads from the survey and additional
studies in more detail on:
- the relevant parts of the different legal
frameworks in the countries concerned,
- the business models currently employed and
- the status of (governmental) mass data producing laboratories and agencies
like meteorological, geological, topographical and environmental institutes,
census bureaux, health services, cultural heritage collections, libraries)
4.3. Working Plan: procedure and time table
- The activities of the Working group will
take place between October 2001 and September 2002; the Report should be finalised
in September 2002 to be presented at the fourth Global Research Village Conference
to be held in Poland (10-11 October 2002).
- The Report, or parts of it, will be presented
also at the 18th CODATA Conference to be held in Montreal between 29 September
and 4 October 2002 and at the Society for Social Studies of Science Conference
in November 2002.
- Input for the Report will come from its members
and their organisations. The Bowker/Arzberg NSF project will structure the activities
as researcher Casey will act as secretary for the Working group as well and
edit the report.
- The Bowker/Arzberg research project will
continue after the publishing of the Report and will be concluded at the end
of 2002.
The bulk of the activities of the Working group
will be conducted by e-mail and tele-conferencing supported by the secretariat
to be located in San Diego. The progress of the activities will be published
on the Website of the Working Group.
The Working Group will meet in person:
- at a starting session on 17 October in Paris
- in June 2002 to review the work and finalise the Report
(Meetings will be organised to coincide as much as possible with these of the
GRV 4 Steering Group)
An international workshop/expert meeting is
planned to be held in the Spring of 2002 (tentatively March 18 2002) in Europe.
Members of the Working Group will meet
- at the 4th Global Research Village Conference (10-11 October 2002)