The Public Domain of Digital Research Data

OECD Follow-up Group on Issues of Access to Publicly Funded Research Data


Planning Document

1. Background

1.1. OECD Global Research Village Conferences
After conferences in Denmark (1996) and Portugal (1998), the third OECD Global Research Village Conference was held on 6, 7 and 8 December in Amsterdam. The conference was organised jointly by the OECD Committee for Scientific and Technological Policy (CSTP) and the Netherlands Ministry of Education, Culture and Science.
The conference was opened by Minister Hermans of the Netherlands, OECD Secretary-General Johnston and EU-Commissioner Busquin. Minister Wiszniewski of Poland closed the conference and offered to host the next conference in Poland in 2002.
94 representatives from governments, the European Union, research institutes and research organisations from 20 countries attended the conference.

1.2. ICT and Access to the global science system
Like its predecessors, the conference addressed the policy implications of the use of Information and Communication Technologies (ICT) for the science system.
This time the conference focussed on issues of "Access to publicly financed research".
To give two examples, the science policy aspects of developing ICT infrastructures like Next Generation Internet and the GRID and the accessibility of information from the databases of Human Genome projects were addressed.
Technical (ICT infrastructures) and regulatory (legislation on IPR , copyright and privacy) aspects of accessing publications, data and other resources from publicly financed research by scientists, industry and the public were discussed.


1.3. Practices and Principles
The Conference Recommendations invited the Conference Steering Group to further elaborate on the questions discussed. The Steering Group took up the suggestion and proposed a Working Group of experts (from OECD countries with involvement from the European Union, the European Science Foundation (ESF), and the US National Science Foundation (NSF) to be established with the purpose to:
- report on current practices and their underlying principles and
- make policy suggestions about options of further implementation of these principles,
concerning Access to publicly financed research information to the CSTP at the next Global Research Village Conference to be held in Poland in 2002.
By principles are meant the general normative (legal, ethical, political and economical) fundamentals relevant to Access to and Sharing of Research Data. These principles are to be used as the groundwork for more specific, practical regulation in guidelines. (Examples of successful guidelines based on a systematic set of principles are The OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data (1980), or the Principles and Guidelines for the Sharing of Biomedical Research Resources (1999) of the US National Institutes of Health (NIH)).

1.4. OECD/CSTP Working Group
At the 76th The OECD/CSTP meeting of 13/14 March 2001 (ITEM 14, 32.), the CSTP
" ii) Agreed to the proposals presented by the Dutch and Polish Delegations in Room Document No.6 concerning the follow-up to the Third Conference on the Global Research Village, including
- the creation of a working group on access to research, and
- the preparation of the Fourth Conference to be held in Poland in 2002."
During the summer of 2001, experts and policy makers from the United States, The Netherlands, Denmark, Poland and the European Science Foundation, along with researchers from the US and The Netherlands, undertook various activities to prepare the establishment of the Working Group including further expert consultation and commissioning preliminary studies.

1.5. Preparations
As a result of the talks with interested parties, it was decided to focus on access to "data" instead of "research information". It was felt that "Information" was too broad and/or vague a concept in this context, that the issues of electronic publishing and the "Serial Crisis" were treated satisfactory elsewhere, and that the important issues could be handled in a better way by focussing on data. The narrower focus was expected to make the process of reaching possible consensus on principles, eventually to be used in regulation, easier. Another result of the consultation was the decision to include not only a policy study on the subject, but a study of the scientific aspects on access to and sharing of data as well.

1.6. Preliminary studies
- A quick preliminary study on the State of affairs was undertaken (between July 15th and September 15th 2001) by NIWI (Netherlands Institute for Scientific Information Services) under the supervision of Dr. Paul Wouters. It included a Quick Scan of the current relevant regulation in the US. and a 'Mini Survey' to help to define the issues in data sharing currently felt as most urgent in the other OECD countries.

- A proposal to NSF was made for a project combining research into data-sharing from a social informatics perspective project and policy research to support the activities of the Working Group was drafted (submitted August 12, 2001***) by Prof. Peter Arzberger and Prof. Geof Bowker from the University of California, San Diego.

- A proposal was made by professor Prof. Hans Franken to the Netherlands research programme Information Technology and the Law for a comprehensive national survey on the legislation and regulation relevant to access to and sharing of research data. It was accepted and will be carried out during the first months of 2002.

2. Points of departure
Discussions, consultations with experts and preliminary results of desk research have led to the following conceptual points of departure. These points of departure represent statements on the wide ranging, heterogeneous field of scientific research in general.
Although the meaning of various aspects of data management may differ depending on the disciplinary, institutional or national context, (e.g. the protection of personal data of citizens compared to the protection of meteorological data for reasons of national security) similarities will prevail. Comparing different practices from different fields can lead to models that are successfully applicable elsewhere. On the level of international science policy a general approach should be useful.

2.1. Sources for research
In scientific research data on the world we live in are processed in order to test hypotheses concerning that same world. Data-sets, systematic collections of numerical scores, textual records, images and sounds, are used as sources for scientific research.
Data processing can be seen as the administrative (virtual) paperwork of research. Data are the lifeless records of phenomena concerning living and inanimate physical beings and objects around us. Of course processing of identifiable personal data (ICT is pushing hard at the limits of identification!) can have enormous consequences in real life and should be subject of regulation in conformity with national legislation.

2.2. Data as separate part of research
Generally speaking, data can be treated, conceptually as well as practically, as relatively autonomous elements in the research process. Data collection and data management usually can be treated as separate activities and functions in research, potentially done by specialists.
Data used in research are not necessarily collected by researchers: extensive data sets from governmental organisations (Census, health services, meteorological, geological institutes) and commercial firms (market research, geographical information) are frequently used for scientific research.
Cost for data constitute a distinct part of the research budget.

2.3. Ownership, rights of disposal
Ownership of data used in research is not always clearly established in the relevant documents. Formally, most of the public funded research data are in the public domain, accessible to all (data as such do not qualify as intellectual property so once published, all data are in the public domain). In everyday research life, researchers may tend to act as owners of the data used in their projects. The responsibility for sustainable archiving of research data is not always assigned to the relevant parties. Lack of regulation on these aspects may hamper Access to and sharing of research data.

2.4. Multiple use
Use of the same data for multiple research (researchers, projects, institutes) should be considered beneficial to the quality and productivity of research.
Sharing data will widen the scale and scope, and enhance the quality and quantity of data resources.
Sharing data will improve the cost-benefit ratio of data collection and data management and will lead a higher return on investments in data.

2.5. ICT widens the scale and scope of data-sharing
Thanks to the use of ICT facilities, an ever-increasing amount of data, ranging from very specific to general purpose data, should be considered as potential sources for multiple research on a global scale. By the ICT-mediated sharing of data, the same sources can be of use for different research problems, projects, institutes and disciplines, at different places, at different times.

2.6. Effects of data-sharing on the research process
Increased ICT mediated sharing of data for research requires agreement on standards for quality control and scientific and technical interoperability. This will mean a force towards further homogenisation of research methods, techniques and paradigms in increasingly data driven research. Diminishing diversity in research perspectives and reinforcement of the Matthew Effect (Robert K. Merton) can have negative effects on the progress of science.

2.7. Co-ordination in data management required
In the current research practice, better use could be made of the increased possibilities of national international data sharing. Existing data sets lend themselves to more extensive use by more researchers but collaborative arrangements of exploitation and archiving are needed to realise this potential. Co-ordination in the collection and exploitation of additional data is needed to provide a supply that meets the future scientific demand in better ways.

2.8. Co-ordination requires policies
More extensive and better use of research data requires more explicit policies to promote global access to and sharing of data from governments, research funding organisations, research institutes and professional organisations. These national and international policies should create the conditions of openness that make sharing of data attractive and valuable to the individuals and organisations concerned.

2.9. Expensive facilities
In almost all scientific fields the frontiers have been pushed by the use of (large) facilities (new 'computerised' instruments connected by network infrastructures) taking care of the digitisation of the data supply. These facilities can be so extremely expensive that international co-operation is required to establish them. Once established, management, processing, distribution and sustainable archiving of the data generated will often remain as expensive as to require continuing international co-operation.

2.10. Interoperability
National and international policies for access to and sharing of research data should address the relevant technical and scientific standards concerning quality and interoperability required for co-operative arrangements.
They should take into account the positive as well as the negative effects of interoperability on the diversity of paradigms in the global research system.

2.11. Economic, legal and regulatory aspects
National and international policies for access to and sharing of research data should address the relevant issues of investment and ownership of data, rights of disposal (intellectual property rights of databases and its (retrieval) software) and, as far as relevant, protection of individual privacy and national security.
-----

3. The OECD/CSTP Working Group

3.1. Purpose
The international working group is established to promote Access to and Sharing of Research Data. To this end it will:

- Report on current practices concerning Access to and Sharing of Research data and their underlying principles on the basis of case studies;

- Report on effects of selected current data sharing practices on the quality of research and the progress of science;

- Suggest principles for making policy on data sharing within the relevant national and international policies and regulatory frameworks.

3.2. Participation
Prof. Peter Arzberger, executive director of the US National Partnership for Advanced Computational Infrastructure participated in the preparatory activities together with Geof Bowker, Professor of Communication and Science Studies at UCSD. Arzberger will chair the Working Group. Arzberger and Bowker together with Kathleen Casey (UCSD) have made a proposal to NSF *** for the Working Group as a combined policy/research project. Dr. Paul Uhlir, from CODATA, Director of International Scientific Information Programs at the US National Academy of Sciences/National Research Council, will participate. Peter Schröder (co-ordinator Information Policy, Ministry of Education, Culture and Science, The Netherlands), Hugo Von Linstow, (Advisor Analyses and Strategies, Ministry of Research and Information Technology, Denmark) and Tony Mayer, Head ESF Secretary General's Office, will continue their commitment.
From the GRV 4 Steering Group, Prof. Andrzej Wierzbicki, Director National Institute of Telecommunication, Poland, will participate. Prof. Doug Mc. Eachern, Director Social and Behavioural Science with the Australian Research Council, has accepted the invitation to participate in the Working Group, Canada has expressed willingness to participate. Other countries from Asia and Europe/EU will be asked to nominate participants. Further collaboration with ESF and European is to be expected.

3.3. Product
The Working Group will publish a Report on data access and sharing that will include a science policy section:
- describing existing arrangements of data access and sharing from selected cases in various research disciplines, organisations and countries;
- analysing the formal and informal rules applicable in the arrangements;
- formulating a set of commonly agreed Principles derived from best practices in these arrangements as well as the underlying normative values;
- coming up with policy recommendations to improve conditions to access and sharing

The Report will also include a scientific section that will present the outcome of a separate research project on the practice and trends in data sharing:
- presenting a social informatics perspective on the arrangements of access and sharing;
- comparing various arrangements of access and sharing from this perspective;
- addressing issues of scientific standards, peer-review and quality control;
- coming up with conclusions on the positive and negative implications of data access and sharing on the research process and the research system.

3.4. Scope
Anything imaginable can be used as data for scientific research. The Working Group will limit its attention as much as possible to source data as distinct from bibliographical data. As the project is directed primarily at governments, it will focus primarily on data collected with public funds. These could be data collected for research, but also data collected for other governmental purposes (census data, meteorological data from meteorological, geological and geographical surveys). As public funded data can end up in data sets compiled by commercial firms, arrangements with market parties are not to be excluded.
The Working Group will focus on arrangements of access and sharing according to their current and future scientific and socio-economic significance. The outcome of the preliminary 'mini survey' should help to give indications on this.
3.5. Addressed parties
The Report of the Working Group will be addressed to:
- (primarily) the governments of the OECD countries and other governments connected in the global science system;
- organisations responsible for the funding of research
- research institutes
- professional scientific societies

3.6. Results
In formulating a set of Principles, the report of the Working Group will contribute to a better understanding of the importance of access to and sharing of data to the research process, the science system and science policy by:
- raising the awareness of the relevant parties on the subject where needed;
- putting the issues more firmly on the relevant agendas and
- supporting the establishment of the necessary specific policies and regulation.


4. Working Plan

4.1. Preparations
Between April and October 2001 the following preparatory activities (see also 1.6.) were carried out:

* Consultation of experts and policy makers from OECD delegations, the European Science Foundation, the European Commission, the US National Science Foundation, the Steering Group of the 4th Global Research Conference and researchers at universities in the US and The Netherlands. Preliminary contacts were made with CODATA/ICSU, OECD and delegates from Japan and Canada.

* A quick elementary survey on the State of affairs was held by Paul Wouters from The Netherlands Institute for Scientific Information Services (NIWI). This study consisted of:
- a Quick Scan of the current relevant regulation on data sharing as formalised and practised by a selection of research organisations in the United States (for instance NSF, NIH, NRC and AAAS)
- a 'Mini Survey' (simple e-postal questionnaire) of the member organisations of ESF, and similar organisations in Japan, Australia and Canada) to define the issues in data sharing currently felt as most urgent in the other OECD countries.
At the time of writing, the first results of these studies are coming in. In November 2001 the final findings will be available.

* US particpants Peter Arzberger, Geof Bowker and Kathleen Case (University of California, San Diego) made a proposal to the US National Science Foundation for a project combining scientific research and policy research into data-sharing. The project could function as the organisational backbone of the Working Group. It will treat Access to and Sharing of Research Data from the viewpoint of science policy and data management as well as from a social informatics perspective.
A decision of NSF is expected on short notice.

* Hans Franken, professor of Law at Leiden University (and chairman of the third Global Research Village Conference) made a proposal for a comprehensive international survey on the legislation relevant to access to and sharing of research data, its purposes and underlying legal principles, the relevant international treaties, additional regulation and relevant jurisprudence.
The project was accepted by the Netherlands National Research Programme "Information Technology and the Law" and will be carried out during the first months of 2002.

4.2. Working Plan: content
Quick Scan and Mini Survey
At the time of writing, the results from the two preparatory studies from NIWI were not yet complete. Anticipating the conclusions from the Quick Scan of US regulation, it is safe to state that all the relevant important organisations have enacted rather elaborate regulation on Access to and Sharing of Research Data. This regulation is permeated by a sense of public accountability and accessibility that largely seems to be derived from the Freedom of Information Act and the Bayh Dole Act. In this way, the existing regulation illustrates the importance of the availability and the constraint of suitable national legislation on Access and Sharing of Research Data.
Another aspect of the regulation seems to be a tendency to protect investment by researchers as primary collectors of data against claims from outside the research community. A systematic policy towards the building of a common, public data infrastructure seems absent outside the specialised Big Science organisations.
Preliminary findings from the Mini Survey among ESF member organisations showed that policies on Access to and Sharing of Research Data are far from being as common as in the US. Still most of the respondent expected that Access and Sharing would develop into an important policy issue in all scientific fields in the near future. The main problems expected by the respondents were technical problems of interoperability, descriptive standards and institutional barriers. Personal attitude from researchers and aspects of ownership were also mentioned. In contrast to the content of the US regulation, commercialisation of data did not seem to represent a major issue.

The Working Group will direct its activities at further elaboration of the most important leads from the survey and additional studies in more detail on:

- the relevant parts of the different legal frameworks in the countries concerned,
- the business models currently employed and
- the status of (governmental) mass data producing laboratories and agencies like meteorological, geological, topographical and environmental institutes, census bureaux, health services, cultural heritage collections, libraries)


4.3. Working Plan: procedure and time table

- The activities of the Working group will take place between October 2001 and September 2002; the Report should be finalised in September 2002 to be presented at the fourth Global Research Village Conference to be held in Poland (10-11 October 2002).

- The Report, or parts of it, will be presented also at the 18th CODATA Conference to be held in Montreal between 29 September and 4 October 2002 and at the Society for Social Studies of Science Conference in November 2002.

- Input for the Report will come from its members and their organisations. The Bowker/Arzberg NSF project will structure the activities as researcher Casey will act as secretary for the Working group as well and edit the report.

- The Bowker/Arzberg research project will continue after the publishing of the Report and will be concluded at the end of 2002.

The bulk of the activities of the Working group will be conducted by e-mail and tele-conferencing supported by the secretariat to be located in San Diego. The progress of the activities will be published on the Website of the Working Group.

The Working Group will meet in person:
- at a starting session on 17 October in Paris
- in June 2002 to review the work and finalise the Report
(Meetings will be organised to coincide as much as possible with these of the GRV 4 Steering Group)

An international workshop/expert meeting is planned to be held in the Spring of 2002 (tentatively March 18 2002) in Europe.

Members of the Working Group will meet
- at the 4th Global Research Village Conference (10-11 October 2002)