The Public Domain of Digital Research Data

OECD Follow-up Group on Issues of Access to Publicly Funded Research Data

Group and Background Information
Science
Policy View
Goal
of the Project
Members
Preliminary
Results
Interim
Report

Final Report
and
Other Publications

Home

OECD Follow up group on issues of access to research data from public funding

REPORT ON PROGRESS AND COMING ACTIVITIES (March 2002)


Executive Summary


Since the 77th meeting of CSTP (18-19 October 2001), the working group has accomplished the following:

- Completion of two "quick elementary studies" (NIWI - Paul Wouters, to be published by end of March 2002) on:
- Other work underway - Additional Members of the working group: Since the meeting on October, several other countries are now involved in this project. These include representatives from Japan, Canada, and Finland.


REPORT ON PROGRESS AND COMING ACTIVITIES

1. Working Group

At the 76th The OECD/CSTP meeting of 13/14 March 2001 (ITEM 14, 32.), the CSTP

" ii) Agreed to the proposals presented by the Dutch and Polish Delegations in Room Document No.6 concerning the follow-up to the Third Conference on the Global Research Village, including
- the creation of a working group on access to research information, and
- the preparation of the Fourth Conference to be held in Poland in 2002."

1.1. Update

This document gives an update on DSTI/STP(2001)35, 26 September 2001, and Room Document No. 4. From the 77th meeting of the CSTP at 18-19 October 2001 concerning the Working Group on research data issues. It was finalised on 11 March 2002.

1.2. Contents

- Section 2 gives the rationale for the activities of the Working Group: working out strategies in research policies and practices to cope with the dramatically increased amounts of research data generated by the Global Science System
- Section 3 summarises the preparatory activities undertaken since the 76th CSTP meeting leading to agreement on "Points of departure" and program of aims and means

- Section 4 sketches the Work Plan for the activities of the group for the period from March 2002 until March 2003

- Section 5 contains summaries of the results of the preliminary studies commissioned by the Working Group as well as the projects currently being carried out and projects being planned.

- Appendix 1. gives a brief record of the origin of the Working Group

- Appendix 2. states the "Points of departure"

- Appendix 3. contains a list with names and URL's of the current members and associates.

2. Rationale

Data as the basis of the value chain for Science and Society

2.1. Preface

Why is data access and sharing of publicly funded data an issue, now?

As the twenty-first century opens, a set of different national policies concerning information and communication technologies and scientific research have come into play. Ranging from the European Union Database Directive to the Freedom of Information Act, these policies demand attention, particularly as they pertain to the role of science in society and the public's access to research products. Similar to the legislation developed during the early 20th century that sought to balance economic gains with societal good, contemporary changes in information and communication technology require a re-examination of the role of government in issues of balancing economic gains, science and society.

The changes in information and communication technology (ICT) have been tremendous for society in general, and science in particular. Science is being enhanced by the ICT revolution in its ability to both acquire new data as well as to share it, at unprecedented scales of time (almost instantaneously) and volume. The science system is facing a tidal wave of data that it must address to ensure optimal return on the data-generation investments. In parallel, the larger issues of society such as national security, individual privacy, and intellectual property rights are all being re-examined in the light of ICT revolution. Decisions today in each of these realms, often without due consideration for the role of science in society, or the mechanisms of the scientific enterprise, such as the sharing of and access to data (the life blood of the scientific enterprise), will have profound consequences.

Science is a driver of innovation and economy as well as a developer of knowledge. As the access to data is an integral part of the science process as well as a driver of the economy, it is appropriate to understand impact of technology that enable science in the context of commercialisation, privacy and security and their impact on science in order to ensure optimal benefit of public investment in science.

We see on a daily basis the promise that data sharing can bring to science and society. However, the lack of policies that address the access and sharing of research data alongside the proliferation of competing policies have created a situation that could threaten underlying principles associated with data sharing and its benefits for science, the economy, and society.

2.2. A quantum leap in data use

The use of ICT has radically enlarged the scale of research in the Global Science System of the 21th Century. Increased computing capacity has led to the deployment of larger and more complex workforces of researchers connected by ever larger and more complex computer networks using larger and more complex instruments powered by complex software - to both produce and use ever larger masses of digital data.

Sir Isaac Newton was able to revolutionise physics working with pen and paper from a limited set of observations (not yet called data) and exchanging letters with only a few European colleagues.For a next step in contemporary physics, scientists are joining forces to build the Large Hadron Collider at CERN that will accumulate 10 Petabytes of digital data (10 quadrillion bytes, about 20 Million CD-ROMs) per year. In order to deliver the data to the10.000 researchers from 1000 research institutes in 50 different countries and to enable them to use the data, the LHC Computing Grid is being developed: CPU power over 4 Million SPECint95, LAN throughput about a Terabit p/sec. at dozens of sites, WAN capacity many Gigabits p/sec. to hundreds of sites.

2.3. Data policies and data management

CERN certainly ranks among the 'Big Science' institutes that are best prepared to handle this quantum leap in data use. However, many other research institutes in different fields as for instance Structural Genomics, Biodiversity, Atmospheric Research and Remote Earth Observation, that will generate comparable amounts of data, do not always seem prepared for the data flood that is coming their way.

To ensure optimum production and use of the growing masses of digital research data for science and society, researchers will have to adapt the ways they handle data to the changing circumstances. At the institutional level additional attention from science policy and research management is called for.

2.4. Data driven science

By using and producing ever increasing masses of digital data, science is becoming more and more "data driven". Every day the availability of the right sets of digital data is becoming more important to the success of research. This goes for natural sciences as well as social sciences and the humanities. Costs for the collection and processing of data are taking up a growing proportion of the research budgets.

2.5. Changing perspectives

Use of ICT has led to a spectacular increase in the possibilities for the collection and use, the input and output, of digital data in research. However, the traditional framework of policies and management (and bookkeeping) pays attention to categories as for instance workforce, facilities and publications, but usually does not yet take into account data use as a relevant distinct component. Looking at the relevant figures, one could easily get the impression that the Global Science System is not quite prepared to handle the tsunami of research data that is coming its way.

2.6. A force of its own

Instead of looking at the supply of research data as an integrated trajectory that needs systematic planning and budgeting starting from the collection of data, via processing, storage and dissemination to end with sustainable archiving, the parties involved tend to pay attention only to parts of the chain. Instead of taking data as a distinct factor with a broad range of application, data often seem to be considered as undifferentiated incidental research costs. Instead of looking at data as resources for multiple uses in different projects by different research institutes, data are often inaccessible to users other than the ones who set up the original collection.

2.7. Additional policies and strategies

The amount of data used in research is growing at a staggering rate and so is the expenditure on data and data-related activities. The consequences of this increase will have a great impact on the future of the Global Science System. Optimisation of access to research data requires additional attention from researchers, their communities and leaned societies as well as from institutional research management and government policies.
As the quality and productivity of research is getting more and more dependent on access to digital data, these policies should promote the use of existing and future data sources in a more systematic way.

An example of a principal as well as practical policy approach to the issues sketched was formulated by the US National Institutes of Health (NIH) in its "Draft Statement on Sharing Research Data" from March 1, 2002, see the information at http//grants.nih.gov/grants/policy/data_sharing/index.htm.The Main Page gives access to an excellent (draft) "Data Sharing Workbook" and a document listing FAQ's.

2.8. More quality and efficiency

To get more scientific quality out of data, data should be accessible to be used by more researchers for differing purposes in different projects, at different places over longer periods of time. To get a higher return on investments, additional policies should also promote the efficiency and productivity of research. As research data are largely publicly funded, better access to data should heighten public return on public investments.

2.9 International co-ordination and co-operation

Intensive international co-ordination and co-operation was needed for the establishment and operation of the larger research facilities and the deployment of the larger human resources of the contemporary Global Science System. Compared to the attention paid to investments in the infrastructures and human capital, the interest in explicit policies for investment in and exploitation of the extremely valuable 'floating capital' of scientific data has until now often been limited. However, a fresh look at international co-ordination and co-operation to optimise the production and use, the collection, processing, dissemination and archiving seems even more challenging, being less complicated and holding the promise of a better cost/benefit ratio.


3. The OECD/CSTP Working Group

3.1. Proposal

At the 76th The OECD/CSTP meeting of 13/14 March 2001 (ITEM 14, 32.), the CSTP

" ii) Agreed to the proposals presented by the Dutch and Polish Delegations in Room Document No.6 concerning the follow-up to the Third Conference on the Global Research Village, including
- the creation of a working group on access to research information, and
- the preparation of the Fourth Conference to be held in Poland in 2002."
(See for more information about the origins of the Working Group Appendix 1. )

During the summer of 2001, experts and policy makers from the United States, The Netherlands, Denmark, Poland and the European Science Foundation, along with researchers from the US and The Netherlands, undertook various activities to prepare the establishment of the Working Group including further expert consultation and commissioning preliminary studies.

3.2. Preparations

The preparations for the activities of the Working Groups were led by Peter Arzberger, Director, Life Sciences Initiatives at the University of Califironia San Diego, (UCSD, Hugo Von Linstow, CSTP delegate from Denmark and Peter Schröder, co-ordinator Information Policy, Ministry of Education, Culture and Science in The Netherlands. Researcher Paul Wouters from the Netherlands Institute for Scientific Information Services (NIWI) contributed to the programming and execution of research. The CSTP nominated members from Australia, Canada, Denmark, Finland, Japan, Poland and the United States, NSF, ESF, CODATA participated in the activities. Discussions, consultations with experts and preliminary results of desk research led to the conceptual "Points of departure" that can be found in Appendix 2.

3.3. Focus on data

As a result of the talks with interested parties, it was decided to focus on access to "data" instead of "research information". It was felt that "Information" was too broad and/or vague a concept in this context, that the issues of electronic publishing and the "Serial Crisis" were treated satisfactorilyy elsewhere, and that the important issues could be handled in a better way by focussing on research data. The narrower focus was expected to make the process of reaching consensus on principles, eventually to be used in a formal framwork, easier. Another result of the consultation was the decision to include not only a policy study on the subject, but a study of the scientific aspects on access to and sharing of data as well.

3.4. Purpose of the Working Group

The international working group was established to promote Access to and Sharing of Research Data. To this end it will:

- Report on current practices concerning Access to and Sharing of Research data and their underlying principles on the basis of case studies;

- Report on effects of selected current data sharing practices on the quality of research and the progress of science;

- Suggest principles for making policy on data sharing within the relevant national and international policies and regulatory frameworks.

3.5. Product

The Working Group will publish a Report on data access and sharing that will include a science policy section:

- describing existing arrangements of data access and sharing from selected cases in various research disciplines, organisations and countries;
- analysing the formal and informal rules applicable in the arrangements;
- formulating a set of commonly agreed Principles derived from best practices in these arrangements as well as the underlying normative values;
- coming up with policy recommendations to improve conditions to access and sharing


The Report will also include a scientific section that will present the outcome of a separate research project on the practice and trends in data sharing:

- presenting a social informatics perspective on the arrangements of access and sharing;
- comparing various arrangements of access and sharing from this perspective;
- addressing issues of scientific standards, peer-review and quality control;
- coming up with conclusions on the positive and negative implications of data access and sharing on the research process and the research system.

3.6. Scope

Anything imaginable can be used as data for scientific research. The Working Group will limit its attention as much as possible to source data (factual data; observations, measurements recorded in text as well as images and sound) usable as input for research as distinct from bibliographical data. As the project is directed primarily at governments, it will focus primarily on data collected with public funds. These could be data collected for research, but also data collected for other governmental purposes (census data, environmental data from meteorological, geological and geographical surveys). As public funded data can end up in data sets compiled by commercial firms, arrangements with market parties are not to be excluded.
The Working Group will focus on arrangements of access and sharing according to their current and future scientific and socio-economic significance.

3.7. Addressed parties

The Report of the Working Group will be addressed to:

- (primarily) the governments of the OECD countries and other governments connected in the global science system;
- organisations responsible for the funding of research
- research institutes
- professional scientific societies and the scientific community in general

3.8. Results

In formulating a set of Principles, the report of the Working Group will contribute to a better understanding of the importance of access to and sharing of data to the research process, the science system and science policy by:

- raising the awareness of the relevant parties on the subject where needed;
- putting the issues more firmly on the relevant agendas and
- supporting the establishment of the necessary specific policies and regulation.

3.9. Organisation, Membership

Peter Arzberger chairs the Working Group, Peter Schröder will act as vice-chairman. Researcher Kathleen Casey is secretary, and heads the bureau of the Working Group in San Diego, Teri Simas is the administrative co-ordinator ( simast@sdsc.edu ).
The Website for the Working Group will shortly be available at http://dataaccess.sdsc.edu/
A list of the current members and associates of the Working Group is included in Appendix 3.

4. Preparations and Work Plan

4.1 Between April 2001 and March 2002 the following activities were carried out:

* The consultation of experts and policy makers from OECD delegations, the European Science Foundation, the European Commission, the US National Science Foundation, the Steering Group of the 4th Global Research Conference, CODATA and researchers at universities in the US and The Netherlands.

* Two quick elementary studies on the State of Affairs by Paul Wouters from the Netherlands Institute for Scientific Information Services (NIWI):

- a Quick Scan of the current relevant regulation on data sharing as formalised and practised by a selection of research organisations in the United States (for instance NSF, NIH, NRC and AAAS)

- a 'Mini Survey' (simple e-postal questionnaire) of the member organisations of ESF, and similar organisations in Japan, Australia and Canada) to define the issues in data sharing currently felt as most urgent in the other OECD countries.

Results to be published in March 2002.


* The drafting by US participants Peter Arzberger, Geof Bowker and Kathleen Case (University of California, San Diego) of proposal accepted by the US National Science Foundation (NSF) for a project combining scientific research and policy research into data-sharing. The project is meant to function as the organisational backbone of the Working Group. It will treat Access to and Sharing of Research Data from the viewpoint of science policy and data management as well as from a social informatics perspective. The Report shall be finalised in Spring 2003.


* A constitutional meeting held in Paris on October 17 2001. Peter Arzberger was appointed chairman, at the suggestion of CSTP delegates the members were appointed (see Appendix 3). Procedures, time schedule and Work Plan were discussed.


* Agreement on a longlist of relevant case studies covering the fields of:


- High-Energy Physics
- Astronomy
- Meteorological/Atmospheric research
- Structural Genomics
- Biodiversity and the Global Biodiversity Information Facility
- Epidemiology
- Social Sciences
- Cultural Heritage research


* Agreement on the following topics to be addressed in the case studies:

- the relevant parts of the different legal frameworks in the countries concerned,
- the business models currently employed and
- the status of (governmental) mass data producing laboratories and agencies like meteorological, geological, topographical and environmental institutes, census bureau, health services, cultural heritage collections, libraries)


* The start of a comprehensive study on the national legislation relevant to access to and sharing of research data proposed (by Hans Franken, professor of Law at Leiden University (and chairman of the third Global Research Village Conference) accepted by the Netherlands National Research Programme "Information Technology and the Law" (ITER) to be finalised in May of 2002.

* The start of two NIWI follow up studies:

- a study on the importance of Trust in the practice of smaal scale data sharing
- a study on data policies and management at the international 'Big Science' organisations CERN (European Organisation for Nuclear Research) and EMBL (European Molecular Biology Laboratory).
Studies to be finished by the end of 2002.


* The preparation of an expert meeting on the economics and management of digital research data for the Global Science System. Probably to be organised by Luc Soete and to be held at the Maastricht Research Institute on Innovation and Technology (MERIT) in September 2002.

4.2. From March 2002 onwards the planning is as follows:

- The activities of the Working group will take place between October 2001 and the summer of 2003. The first progress report to the CSTP will be made at the 78th meeting to be held in Paris at 19 and 20 March 2002.

- An interim Report will be finalised in September 2002 to be presented at the fourth Global Research Village Conference to be held in Poland (10-11 October 2002).


- The interim Report, or parts of it, will be presented also at the 18th CODATA Conference to be held in Montreal between 29 September and 3 October 2002 .

- A progress report and the interim Report will be presented to the CSTP at its 79th meeting to be held in Paris at in October 2002.

- Various results of the activities of the Working Group will be presented at the Society for Social Studies of Science Conference in November 2002.

- The final version of the Report will be presented to CSTP at its 80th meeting in March 2003 in Paris.


- Input for the Report will come from its members and their organisations. The Bowker/Arzberger NSF project will structure the activities as researcher Casey will act as secretary for the Working group as well and edit the report.


- The Bowker/Arzberger research project will continue after the publishing of the final Report and will be concluded at the end of 2003.
The bulk of the activities of the Working group will be conducted by e-mail and tele-conferencing supported by the secretariat to be located in San Diego. The progress of the activities will be published on the Website of the Working Group.

The Working Group will meet in person:

- in Paris on 18 and 19
- March (before the March meeting of OECD/CSTP)
- in June 2002 in San Diego to review the work and finalise the interim Report
(Meetings will be organised to coincide as much as possible with these of the GRV 4 Steering Group)

Members of the Working Group will also meet

- at the 4th Global Research Village Conference (10-11 October 2002)

5. Preliminary results and current research and fact finding

I. NIWI Mini Survey
II. NIWI Quick Scan on US Regulation
III. San Diego study
IV. ITER study on legislative framework
V. NIWI study on trust and sharing
VI. NIWI report on CERN and EMBL practices
VII. MERIT expert meeting on data economics

5.1. The NIWI studies

I. 'Mini Survey' among member of the European Science Foundation, Australia, Canada and Japan.
II. Quick Scan of the current relevant regulation in the US.

5.1.2. Two complementary studies

Two quick preliminary studies on the state of affairs was undertaken (between July 15th and November 15th 2001) by NIWI (Netherlands Institute for Scientific Information Services) under the supervision of Dr. Paul Wouters. It included a Quick Scan of the current relevant regulation in the US. and a 'Mini Survey' to help to define the issues in data sharing currently felt as most urgent in the other OECD countries.
The results will be published in March 2002.

5.1.3. US Regulation: legal framework

Summarising the conclusions from the Quick Scan of US regulation, it is safe to state that all the relevant important research organisations have enacted rather elaborate regulation on Access to and Sharing of Research Data. This regulation is permeated by a sense of public accountability and accessibility derived from the Freedom of Information Act and OMB circular A-130 on Federal information policy (as codified by the Paperwork Reduction Act). In this way, the existing regulation illustrates the importance of the availability and the constraint of suitable national legislation on Access and Sharing of Research Data.

Another aspect of the regulation seems to be a tendency to protect investment by researchers as primary collectors of data against claims from outside the research community, particularly through periods of exclusive use of data by Principal Investigators prior to the publication of results. A systematic policy towards the building of a common, public data infrastructure seems absent outside the specialised "Big Science" organisations.

5.1.4. Other OECD countries: growing awareness

Preliminary findings from the Mini Survey among ESF member organisations showed that policies on Access to and Sharing of Research Data are far from being as common as in the US. Still most of the respondents expected that Access and Sharing would develop into an important policy issue in all scientific fields in the near future. The main problems expected by the respondents were technical problems of interoperability, descriptive standards and institutional barriers. Personal attitude from researchers and aspects of ownership were also mentioned. In contrast to the content of the US regulation, commercialisation of data did not seem to represent a major issue.

5.1.5. The start of something

At the international level, data-sharing still seems to be in its infancy as a policy issue
But providing access to scientific data is fast becoming a crucial aspect of science policy at the national and international level (National Research Council 1997). The need for increased levels of data processing are related to a number of developments: the application of information and communication technologies (ICT) in research; the development of new, often interdisciplinary, research questions; and the increased social and economic role of science, social science and the humanities. At the same time, a prudent use of state of the art information and communication technologies may help create new methods of providing access to scientific data in a timely and cost-effective way on a truly global scale.

5.2. The San Diego Study

III. Access to and Sharing of Data from Public Funding
A proposal to NSF was made for a project combining research into data-sharing from a social informatics perspective project and policy research to support the activities of the Working Group was drafted by Dr. Peter Arzberger and Prof. Geoffrey Bowker from the University of California, San Diego. This has been accepted and the work will continue during the course of 2002. From the proposal to NSF:

5.2.1. Policy issues

Our report will build on the results of the third Global Research Village Conference as well as on the "Bits of Power" report (NRC 1997), but expand the disciplinary range considered (to include the social and medical sciences) in order to account for the new disciplinary and interdisciplinary formations that are coming to the fore - each with their own sets of data practices. Further it will address the issues at the levels of science policy and research management and will have the advantage of six more years of information in this digital age (beyond the Bits of Power study). It will have concrete examples to motivate principles, more so than the CODATA report. In addition, it will build on the principles for access to and sharing of publicly funded research, laid out at the conclusion of the Global Research Village III conference, and will consider implications of implementation of those principles. It will attempt to capture, as a snap-shot, current practices and policies being employed across disciplines, funding agencies and countries.

5.2.2. Social Informatics

Finally, it is our intent to put this work on a strong (social informatics) research footing. A technical fix to the problem of data sharing will not work without strong organizational support. Consider, for example, the widespread non-adoption of the Worm Community System developed to support data sharing in the community of researchers mapping c-elegans, and intended to provide an easy interface with other genome mapping communities. The excellence of the technology could not overcome the desire for privacy, confidentiality and proprietary use amongst many of the researchers involved (especially postgraduates, for whom their engagement with the mapping effort could make or break their careers) and in general without an organizational understanding of the development of scientific careers as they affected data sharing issues (Star and Ruhleder, 1996). In order to develop robust long term information infrastructures, we must combine technical developments and organizational innovation (Bowker and Star, 1999).

5.2.3. Data economies

In order to generate the clusters that will be most useful for the committee, we will organize them along two specific dimensions, each of which will be discussed in detail in the report: international data sharing arrangements, emerging "data economies" including issues of justice and fair use. We discuss these briefly below in order to illustrate the core research issues each dimension poses. We feel that this focusing is important to ensure appropriate scientific rigor in the final report, but more importantly it will provide a clear link to a wider community of researchers, who will be able to continue on the work produced here. We hope to provide a platform for ongoing dialog between the research community and the policy community as technology evolves faster than policy. Furthermore, we will be reaching out to that research community as a result of our effort, and doing so in national and international fora.

5.3. Legislative framework

IV. ITER study
A comprehensive study on the national legislation relevant to access to and sharing of research data, its purposes and underlying legal principles, relevant international treaties, additional regulation and relevant jurisprudence. The proposal was drafted by Hans Franken, professor of Law at Leiden University (and chairman of the third Global Research Village Conference).The project was accepted by the Netherlands National Research Programme "Information Technology and the Law" (ITER) and will be carried out during the Spring of 2002.

5.4. NIWI follow up studies

V. Study on the role of Trust (Anne Beaulieu)
VI. Study on practices at CERN and EMBL (Colin Reddy)

5.4.1. Formal regulation and practice

Given the relevance of clear policy principles, the next question is how they compare with actual data-sharing practices. This is the topic of a set of case studies which are now being undertaken. A number of actors are crucial in the practice of data-sharing: funding agencies, data repositories and archives, dedicated Web sites with data, and not least the researchers themselves. Their interaction determines to what extent data are actually being shared among researchers and between researchers and non-expert audiences.

5.4.2. Effects of informal behaviour

The case studies aim to draw lessons from present data-sharing practices, illustrate the issues that are most pressing, locate best practices and exemplary models, find out which additional policies or funding mechanisms may be needed, and identify the main barriers and obstacles for heightened data-sharing. Which types of tools and regulation are most conducive to data-sharing, and which effects increased data-sharing may have on the research process, will also be addressed in the case studies. One can expect that these effects will vary by scientific field and probably also by the type of data involved. Data-sharing is not always uncontroversial in the scientific community. In some specialties, the duty to make research data publicly available seems to clash with established traditions and routines (or lack thereof). This raises the additional question of the transaction costs.

5.4.3. The perspective of the researcher

Moreover, the application of general principles for data-sharing in research contract conditions requires specialist knowledge of the types of data involved and of the various stages in the research process. This is usually acquired in some form of cooperation or communication with the researchers in question. In other words, the application of the general principles and guidelines is based on, and produces, configurations of trust relationships and practical provisions. Data-sharing is not only a technical issue but a complex social process in which researchers have to balance different pressures and tensions. Basically, two different modes of data-sharing can be distinguished: peer-to-peer forms of data-sharing and repository-based data-sharing.

5.4.4. Personal and formal relations

In the first mode, researchers communicate directly with each other. In the second mode, there is a distance between the supplier of data and the user in which the rules of the specific data repository determine the conditions of data-sharing. In both modes, the existence or lack of trust between the data supplier and the data user is crucial, though in different configurations. One of the case studies focuses on the systematic study of these configurations of trust relationships in data-sharing. The other case studies will result in best practice models for data-sharing.
Together with the study of economic and legal aspects of data-sharing they will hopefully provide us with more knowledge about the basic social mechanisms shaping the access to and sharing of research data and help identify the most important barriers to an increased level of use of existing scientific knowledge and data.

5.4.5. Data sharing in Big Science

In a more formal way, this study will analyse regulation and practice at two international 'Big Science' organisations with a checklist that includes the following aspects:

- Regulation on use of data in the relevant treaties and mission statements
- Formal responsibilities for the collection, processing, use and archiving of data
- Ownership of data, rights of disposal
- Documentation, technical standards
- Quality control, review and data security
- Availability and dissemination

5.5. Expert meeting on the economical and managerial context

VIII. MERIT (Luc Soete)

The growing importance of digital research data as the floating capital of Global Science: Expert meeting on the economics and management of digital research data for the Global Science System. The meeting will address issues connected with the following questions:

5.5.1. Data as an category in economics and bookkeeping

Use of ICT has led to a spectacular increase in the collection and use, input and output of digital data in research. The traditional framework of policies and management (and bookkeeping) pays attention to categories as for instance workforce, facilities and output in research but does not yet take into account data use as a relevant distinct component. Do the dramatic increases in scale and scope of data supply require additional policies on the funding, managing, dissemination and archiving of research data as a distinct category?

5.5.2. Making the use of digital research data visible

Use of digital research data is spectacularly growing and there are figures on the increase of data-related activities in (some disciplines in) research. But it is hard to show something of an accompanying rise in expenditure on data as long as costs (and benefits) are not visible in research budgets.
What empirical evidence is currently available on developments in data expenditure?
How should expenditure on data be accounted for in research budgets: incidental costs?, investments in data infrastructure? Output?

5.5.3. Economic models for data management

Demand as well as supply of research data is usually in the hands of different research institutes, public services and private firms. Costs for appropriate data supply (from data collection to data archiving) should be accounted for in the budgets of funding organisations and research institutes. Cost could include value adding / Access services from private partners ([database] publishers, software makers).
What business models are currently used, what alternatives are feasible?

5.5.4. Research data as Public Good / Research data as proprietary information

Adequate policies on the use of research data require international co-operation and co-ordination between different countries, institutes and private firms to insure conditions for a 'free flow' of relevant data and information. These policies call for a (re)consideration of the current (inter)national legal and regulatory frameworks relevant to openness, access and property rights. What are the (dis)advantages (efficiency, return on investment) of public good and proprietary regimes to insure optimum use of research data? (efficiency of public good regimes, monopolies, niche markets)
What part should there be for Intellectual Property Rights? (What use should researchers make of IP rights?)

Appendix 1.:

Origin of the Working Group

1. OECD Global Research Village Conferences

After conferences in Denmark (1996) and Portugal (1998), the third OECD Global Research Village Conference was held on 6, 7 and 8 December in Amsterdam. The conference was organised jointly by the OECD Committee for Scientific and Technological Policy (CSTP) and the Netherlands Ministry of Education, Culture and Science. The conference was opened by Minister Hermans of the Netherlands, OECD Secretary-General Johnston and EU-Commissioner Busquin. Minister Wiszniewski of Poland closed the conference and offered to host the next conference in Poland in 2002. 94 representatives from governments, the European Union, research institutes and research organisations from 20 countries attended the conference.


2. ICT and Access to the Global Science System

Like its predecessors, the conference addressed the policy implications of the use of Information and Communication Technologies (ICT) for the science system.
This time the conference focussed on issues of "Access to publicly financed research". To give two examples, the science policy aspects of developing ICT infrastructures like Next Generation Internet and the GRID and the accessibility of information from the databases of Human Genome projects were addressed.
Technical (ICT infrastructures) and regulatory (legislation on IPR , copyright and privacy) aspects of accessing publications, data and other resources from publicly financed research by scientists, industry and the public were discussed.

3. Practices and Principles

The Conference Recommendations invited the Conference Steering Group to further elaborate on the questions discussed. The Steering Group took up the suggestion and proposed a Working Group of experts (from OECD countries with involvement from the European Union, the European Science Foundation (ESF), and the US National Science Foundation (NSF) to be established with the purpose to:

- report on current practices and their underlying principles and
- make policy suggestions about options of further implementation of these principles,
concerning Access to publicly financed research information to the CSTP at the next Global Research Village Conference to be held in Poland in 2002.

By principles are meant the general normative (legal, ethical, political and economical) fundamentals relevant to Access to and Sharing of Research Data. These principles are to be used as the groundwork for more specific, practical regulation in guidelines. (Examples of successful guidelines based on a systematic set of principles are The OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data (1980), or the Principles and Guidelines for the Sharing of Biomedical Research Resources (1999) of the US National Institutes of Health (NIH)).

Appendix 2:

Points of departure

1. Generalities and specifics

Discussions, consultations with experts and preliminary results of desk research have led to the following conceptual points of departure. These points of departure represent statements on the wide ranging, heterogeneous field of scientific research in general. Although the meaning of various aspects of data management may differ depending on the disciplinary, institutional or national context, (e.g. the protection of personal data of citizens compared to the protection of meteorological data for reasons of national security) similarities will prevail. Comparing different practices from different fields can lead to models that are successfully applicable elsewhere. On the level of international science policy a general approach should be useful.

2. Sources for research

In scientific research data on the world we live in are processed in order to test hypotheses concerning that same world. Data-sets, systematic collections of numerical scores, textual records, images and sounds, are used as sources for scientific research. Data processing can be seen as the administrative (virtual) paperwork of research. Data are the lifeless records of phenomena concerning living and inanimate physical beings and objects around us. Of course processing of identifiable personal data (ICT is pushing hard at the limits of identification!) can have enormous consequences in real life and should be subject of regulation in conformity with national legislation.

3. Data as distinct part of research

Generally speaking, data can be treated, conceptually as well as practically, as relatively autonomous elements in the research process. Data collection and data management usually can be treated as separate activities and functions in research, potentially done by specialists. Data used in research are not necessarily collected by researchers: extensive data sets from governmental organisations (Census, health services, meteorological, geological institutes) and commercial firms (market research, geographical information) are frequently used for scientific research.
Cost for data constitute a distinct part of the research budget.

4. Ownership, rights of access and disposal

Ownership of data used in research is not always clearly established in the relevant documents. Formally, most of the public funded research data are in the public domain, accessible to all (data as such do not qualify as intellectual property so once published, all data are in the public domain). In everyday research life, researchers may tend to act as owners of the data used in their projects. The responsibility for sustainable archiving of research data is not always assigned to the relevant parties. Lack of regulation on these aspects may hamper Access to and sharing of research data.

5. Multiple use

Use of the same data for multiple research (researchers, projects, institutes) should be considered beneficial to the quality and productivity of research.
Sharing data will widen the scale and scope, and enhance the quality and quantity of data resources.
Sharing data will improve the cost-benefit ratio of data collection and data management and will lead a higher return on investments in data.

6. ICT widens the scale and scope of data-sharing

Thanks to the use of ICT facilities, an ever-increasing amount of data, ranging from very specific to general purpose data, should be considered as potential sources for multiple research on a global scale. By the ICT-mediated sharing of data, the same sources can be of use for different research problems, projects, institutes and disciplines, at different places, at different times.

7. Effects of data-sharing on the research process

Increased ICT mediated sharing of data for research requires agreement on standards for quality control and scientific and technical interoperability. This will mean a force towards further homogenisation of research methods, techniques and paradigms in increasingly data driven research. Diminishing diversity in research perspectives and reinforcement of the Matthew Effect (Robert K. Merton; the referencing effect by which attribution of scientific merit and reputation tends to accumulate with ever fewer celebrities) can have negative effects on the progress of science.

8. Co-ordination in data management required

In the current research practice, better use could be made of the increased possibilities of national international data sharing. Existing data sets lend themselves to more extensive use by more researchers but collaborative arrangements of exploitation and archiving are needed to realise this potential. Co-ordination in the collection and exploitation of additional data is needed to provide a supply that meets the future scientific demand in better ways.

9. Co-ordination requires policies

More extensive and better use of research data requires more explicit policies to promote global access to and sharing of data from governments, research funding organisations, research institutes and professional organisations. These national and international policies should create the conditions of openness that make sharing of data attractive and valuable to the individuals and organisations concerned.

10. Expensive facilities

In almost all scientific fields the frontiers have been pushed by the use of (large) facilities (new 'computerised' instruments connected by network infrastructures) taking care of the digitisation of the data supply. These facilities can be so extremely expensive that international co-operation is required to establish them. Once established, management, processing, distribution and sustainable archiving of the data generated will often remain as expensive as to require continuing international co-operation.

11. Interoperability

National and international policies for access to and sharing of research data should address the relevant technical and scientific standards concerning quality and interoperability required for co-operative arrangements. They should take into account the positive as well as the negative effects of interoperability on the diversity of paradigms in the global research system.

12. Economic, legal and regulatory aspects

National and international policies for access to and sharing of research data should address the relevant issues of investment and ownership of data, rights of disposal (intellectual property rights of databases and its (retrieval) software) and, as far as relevant, protection of individual privacy and national security.

Appendix 3 :

Members of the OECD follow-up group on issues of access to data

Mr. Peter Arzberger (Chair)
Director, Life Sciences Initiatives at the University of Califironia San Diego (UCSD)
University of California, San Diego
parzberg@ucsd.edu


Bureau of the Working Group University of California, San Diego
Ms. Kathleen Casey, Secretary, kcasey@ucsd.edu
Ms. Teri Simas, Administrative Coordinator, simast@sdsc.edu


Mr. Geoffrey Bowker, Professor
Department of Communications
University of California, San Diego
bowker@ucsd.edu


Mr. Koji Kamitani
Office of IT Promotion Research Promotion Bureau
MEXT (Ministry of Education, Culture, Sports, Science and Technology)
kami@mext.go.jp


Mr. Leif Laaksonen
Network Information Services Group (NISG)
Center for Scientific Computing
Leif.Laaksonen@csc.fi


Ms. Gudrun Maass
Principal Administrator
Science and Technology Policy Division, DSTI
OECD
gudrun.maass@oecd.org


Mr. Doug McEachern
Director Social, Behavioural and Economic Sciences
Australian Research Council
doug.mceachern@arc.gov.au
Mr. David Moorman
Policy analyst, Policy and Liaison Branch
Social Sciences and Humanities Research Council
david.moorman@sshrc.ca

Mr. Masamitsu Negishi
Professor, Humanities and Social Science Information Research
Research Information Research Division / Director of Research Division
National Institute of Informatics
negishi@nii.ac.jp


Mr. Peter Schröder (vice-chair)
Co-ordinator Information Policy
Ministry of Education, Culture and Science
Directorate Research and Science Policy
p.schroeder@minocw.nl


Mr. Paul Uhlir
Director, International Scientific and Technical Information Programs
U.S. National Academy of Sciences/National Research Council
puhlir@nas.edu


Mr. Mitsutoshi Wada
Manager Office of Science and Technology Information
Japan Science and Technology Corporation (JST)
wada@tokyo.jst.go.jp


Mr. Andrzej P. Wierzbicki
Director,
National Institute of Telecommunications
a.wierzbicki@itl.waw.pl


Mr. Jan Windmueller
Head of Section
Ministry of Information Technology and Research
jwi@fsk.dk


Researchers, Experts, other Relations

Ms. Anne Beaulieu
Networked Research and Digital Information (Nerdi)
NIWI-KNAW
F 3120 6658013
http://www.niwi.knaw.nl/nerdi
anne.beaulieu@niwi.knaw.nl


Ms. Kathleen Casey
University of California, San Diego
Department of Communication
kcasey@ucsd.edu

Mr. Paul Wouters
NIWI Research
paul.wouters@niwi.knaw.nl


Mr. David Schindel
Head, Europe Office
National Science Foundation
dschinde@nsf.gov


Mr. Tony Mayer
Head Secretary - General's Office
European Science Foundation
amayer@esf.org