The
importance of access to scientific data as one of the key tools in modern
research cannot be over-emphasised. New
ICT methodologies now allow both the construction and storage of very large
amounts of data and their interrogation and use, often in real time and
remotely. The European Science
Foundation considers that the development of policies concerned with access to
scientific databases must be based on international comparison and cooperation
and the adoption of best practice. The
acquisition of data, their storage and accessibility has become a very
significant cost in research. At the
same time, issues of trust in the science system have also become of concern.
The ESF welcomed the OECD initiative to address the
issue of access to publicly-financed data within the context of "Global
Research Village" and was pleased to be able to act as a partner and thus
provide a conduit for its Member Organisations (the national research agencies
in 27 European countries) to be involved in this activity.
This report by Paul Wouters is a very important review
of current policies and provides the essential building block for the
refinement of international best practice for improving access to data and for
encouraging and intensifying international research collaboration.
Enric Banda
Secretary General, ESF
Providing
access to scientific data is fast becoming a crucial aspect of science policy
at the national and international level (National Research Council 1997). The
need for increased levels of data processing are related to a number of
developments: the application of information and communication technologies
(ICT) in research; the development of new, often interdisciplinary, research
questions; and the increased social and economic role of science, social
science and the humanities. At the same time, a prudent use of state of the art
information and communication technologies may help create new methods of
providing access to scientific data in a timely and cost-effective way on a
truly global scale.
The
application of ICTs to promote access to publicly financed research was the
main topic of the Third Global Research Village Conference (GRV III), held in
December 2000. The GRV III conference in Amsterdam concluded governments and
research
organisations should
pay more attention to the conditions for access to data, information and
knowledge (Franken 2000). The sharing of information was seen as one of the
key conditions for the development of scientific knowledge. A special session
of the conference was devoted to policy issues related to the promotion of data
sharing among researchers. This session concluded governments and funding
agencies should demand, in dealing with proposals for the funding of research
infrastructures, that applications include an ICT paragraph addressing the
question of the sharing of data and tools, including the software, and the
sharing of instruments. The OECD/CSTP was asked to produce a short report and
a Web resource on the best practices of international sharing and data, tools
(including software) and instruments. Moreover, the conference concluded that
it would be useful to develop a set of principles for the (international)
access to and dissemination of data, information and knowledge. One of the key
recommendations from the conference was to form a Working Group on current
practices and underlying principles for gaining access to research data
(Franken 2000).
The two
studies in this report aim to contribute to the work of this group of experts
by providing an assessment of the present state of affairs with respect to the
access to, and sharing of, research
data. The first study zooms in on non-US countries, on the basis of an email
survey among members of the European Science Foundation and national research organisations in Australia, Canada and Japan.
This email survey is complementary to the second study. This is a Web scan,
which provides an overview of the policy principles with respect to the access
to, and sharing of, research data in the United States.
At
the international level, data sharing is still in its infancy as a policy
issue. However, most
research organisations expect that the access to, and
sharing of, research data will become a pertinent issue in the next few years.
This is the main outcome of the email survey of members of the European Science
Foundation and national research organisations in Australia, Canada, Japan and Europe. The contrast
with the results of the second study is striking. Public availability and
accessibility of research data is a basic policy principle of the US organisations in this Web scan. This includes
the availability of research data for sharing among researchers.
The
existence of the federal laws governing the data handling processes (Privacy
Act, Freedom of Information Act and the Bayh-Dole Act) are the principal cause
of the difference between the US and Europe. These laws can be understood in
the framework of a political tradition in the US in which public access to
government data is seen as crucial. They have created a regulatory context to
which research organisations
seem to have adapted by developing explicit principles, policies and
guidelines. Outside of the US this is not (yet) the case.
Given this
relevance of clear policy principles, the next question is how they compare
with actual data-sharing practices. This is the topic of a set of case studies, which are
now being undertaken within the framework of the Working Group on current
practices and underlying principles for gaining access to research data. A number of players are crucial in the practice of data-sharing: funding
agencies, data repositories and archives, dedicated Web sites with data, and
not least the researchers themselves. Their interaction determines to what
extent data are actually being shared among researchers and between researchers
and non-expert audiences. The case studies aim to draw lessons from present
data-sharing practices, illustrate the issues that are most pressing, locate
best practices and exemplary models, find out which additional policies or
funding mechanisms may be needed, and identify the main barriers and obstacles
for heightened data-sharing. Which types of tools and regulation are most conducive
to data sharing, and which effects increased data sharing may have on the
research process, will also be addressed in the case studies. One can expect that these effects
will vary by scientific field and probably also by the type of data involved. Data sharing is not
always uncontroversial in the scientific community. In some specialties, the duty to
make research data publicly available seems to clash with established
traditions and routines (or lack thereof).
This
raises the additional question of the transaction costs of rules set by funding
agencies. Moreover,
the application of general principles of data sharing in research contract
conditions requires specialist knowledge of the types of data involved and of
the various stages in the research process. This is usually acquired in some form of cooperation
or communication with the researchers in question. In other words, the
application of the general principles and guidelines is based on, and produces,
configurations of trust relationships and practical provisions. Data sharing is not only
a technical issue,
but also a complex social process in which researchers have to balance different pressures and tensions. Basically,
two different modes of data sharing can be distinguished: peer-to-peer forms of
data sharing and repository-based data sharing. In the first mode, researchers communicate directly
with each other. In the second mode, there is a distance between the supplier of data and
the user in which the rules of the specific data repository determine the
conditions of data sharing. In both modes, the existence, or lack, of trust between the data supplier and the data user
is crucial, though in different configurations. One of the case studies focuses
on the systematic study of these configurations of trust relationships in
data-sharing. The other case studies will result in best practice models for
data-sharing. Together with the study of economic and legal aspects of data-sharing
they will hopefully provide us with more knowledge about the basic social
mechanisms shaping the access to and sharing of research data and help identify
the most important barriers to an increased level of use of existing scientific
knowledge and data.
Part I: Policies on
data-sharing: a preliminary assessment of the current state of the art by an
email survey
Background
Increasingly,
cutting edge research is becoming data-driven in a larger number of disciplines
than in the recent past. The creation of new scientific knowledge needs more and more data as input for novel research. At the same
time, science is also producing an
exponentially rising amount of data. These data are often not only relevant for
the data-producing communities but also for researchers in other fields, for
industry, and for non-profit organisations and institutions.
This
"tidal wave" of data threatens to engulf the existing data
infrastructure in science. No longer can the acquisition, generation,
production, and archiving of data be organised on a case by case basis.
Economically as well as organisationally, guaranteeing access to the relevant
data will become a major concern in science policy.
In the near
future, the challenge posed by the production of data will clearly exceed the
level of the individual researcher or research group. The issues relating to
the gaining of access to public
research data are moving to center stage in science policy making. This raises
the question of to what extent these issues have been addressed in the science
policy area. What is the current state of the art in the access to, and sharing
of, data in science policy in non-US countries? To what extent have research
organisations and institutions developed explicit principles, guidelines and
regulations to actively promote the access to, and sharing of, publicly funded
research data? This is the topic of the present study.
By
conducting an email survey on data-sharing of member organisations of the
European Science Foundation (ESF) and of relevant research organisations in
Australia, Canada, and Japan we have tried to acquire an overview of the
current policies and practice among these research organisations. As will
become clear in this report, the results of this mini-survey give a clear indication
that policies relating to the access to, and sharing of, research data are
still a relatively unexplored domain for many important organisations. The
survey has also produced a snapshot of the expectations that are currently held
by experts of these organisations.
Questions
The
questions posed aimed at acquiring a quick overview of the current state of
affairs with respect to data issues and identifying those issues that were
deemed most important (see Appendix 1 for the full questionnaire and the
accompanying letters). Firstly, the organisations were asked to indicate
whether the access to, and sharing of, data was addressed by government
regulation, and if so by what level of policy making (under discussion, topic
in policy documents, or addressed in legislation). Secondly, the question posed
was whether the organisation itself had
developed explicit policies on data issues. Thirdly, whether the participants
expected that data sharing would become an important issue in the next three
years. The remainder of the questionnaire was aimed at filling in the details.
Amongst other topics, we wanted to know in which fields the participants
expected data sharing issues to be the most relevant (both now and in the future), as well as what sort of problems they
expected (technical, legal, economic or standard-setting issues).
A draft
version of the questionnaire was developed in cooperation with the Dutch
Ministry of Eduction and discussed with the ESF. A questionnaire with 10
questions was then posted on the Web site of NIWI-KNAW
(data-sharing.niwi.knaw.nl). In total, 53
institutional addresses obtained from the European Science Foundation were
approached by both email and regular mail with an accompanying letter, a letter
from the Dutch Ministry of Education and Sciences explaining the survey, and a
letter from the ESF asking for cooperation. The organisations were asked to
fill in the Web form. Additionally, the three national research organisations
in Japan, Australia and Canada were approached. Responses were obtained through
the Web site, via regular mail, and by email. Non-respondents were reminded of
the survey and asked to participate. The Web forms were automatically processed
with Perseus Survey Solutions software. The documents received by email and
regular mail were processed manually.
Results - general overview
In total 31
answers were obtained from 29 different institutions[3] (50 % of the addressees, which is less
than expected). The responses are from 21 different countries. This is 78 % of
the countries involved in the survey. We have not been able to obtain answers
from 6 countries (see Table 1).
|
Response |
Non-response |
|
Australia, Austria, Belgium,
Canada, Czech Republic, Denmark, Estonia, France, Germany, Hungary, Iceland,
Ireland, Italy, the Netherlands, Norway, Poland, Slovenia, Spain, Sweden,
Turkey, United Kingdom |
Finland, Japan, Greece, Portugal, Slovakia, Switzerland |
Table 1 Overview of response by country
The institutions
represent different types. Four categories can be distinguished:
·
national
research organisations and funding agencies
·
scientific
academies and societies
·
research
institutions
·
governmental
bodies
The boundaries
between the different categories are not always clear-cut. For example, the
relationships between research organisations and ministries may vary from
country to country. The same holds for the other types. Scientific academies do
not always have the same functions. In Eastern Europe, they tend to combine the
role of learned society with that of national research organisation running a
network of research institutes. This is different from academies of science for
which the learned society is the main role.
The national
research organisations responded more than average, whereas the reverse holds
for the academies and societies. As a result, the national research
organisations and funding agencies are overrepresented in the survey response,
the academies are underrepresented (see Table 2).
|
|
Total |
Responding |
|
Funding Agencies / Research Councils |
27 (50%) |
19 (66%) |
|
Academies/Societies |
18 (33%) |
6 (21%) |
|
Research Institutions |
8 (15%) |
3 (10%) |
|
Ministries |
1 (2%) |
1 (3%) |
|
TOTAL |
54 (100%) |
29 (100%) |
Table 2: Response by type of institution
In slightly
more than half of the countries (12 out of 21) from which we derived answers,
data-sharing is becoming an issue of science policy. In these countries,
data-sharing is presently under discussion, subject of policy documents, or
part of the national legislation according to respondents from these countries.
In 8 countries, this is not the case. Only in 2 countries, France and Poland,
is data-sharing subject of national legislation. In 6 countries data-sharing is
part of policy documents but not of legislation. This is the case in:
Australia, Canada, Hungary, Iceland, Netherlands and Norway. In 4 countries the
issues are under discussion: Estonia, Germany, Italy and Slovenia. The
remaining 9 countries are not developing policies on access to and sharing of
research data according to the respondents (see Table 3).
|
Current state of affairs |
Countries |
|
Legislation |
France, Poland |
|
Part of policy documents |
Australia, Canada, Hungary, Iceland,
Netherlands, Norway |
|
In discussion |
Estonia, Germany, Italy and Slovenia |
|
No policy in development |
Austria, Belgium, Denmark, Ireland, Spain,
Sweden, Turkey, UK and Czech Republic |
Table 3: Overview of current state of affairs in national
data policies
It should
be noted that in all countries some form of legislation pertaining to data does
exist. For example, in the form of privacy-protection, rules on the use of
clinical data, and protection of intellectual property rights (which may affect
embedded data). However, the state of
affairs, in different countries, pertaining to more advanced science policy focussing on the promotion of access to and
sharing of research data is rather diverse. For example, in Iceland, national
GIS-based databases on Icelandic nature are being developed which run against
some major institutional and standard-setting problems. In most countries, this
type of initiative is not even under consideration according to the
respondents. The historical development of the political system is sometimes an
important factor. In Hungary, for example, researchers were obliged by law to
supply data on any research topic. Since the political turnover, research
institutes have largely ignored this law, resulting in the creation of a new
national data and technological information centre in Hungary. Within one
country, the situation may be different in different institutions and fields.
In Norway, all data from publicly funded research projects in the social
sciences are stored and distributed through the Norwegian Social Science Dataservice,
a branch of the Research Council. These data are freely available to students
and researchers. However, no such system exists for the natural sciences and
technology in Norway.
Ten institutions
have developed some form of policy on issues of the access to, and sharing of, research data. Most institutions
(17) have not (see Table 4)[4].
|
|
Data policy developed |
No data policy |
|
Funding agencies / Research councils |
Australia, Canada, Iceland, Netherlands, Norway |
Belgium, Denmark, Germany, Italy, Spain, Slovenia,
Sweden, Turkey, UK |
|
Academies / societies |
Hungary, Norway, Slovenia |
Austria, Estonia, Ireland, Slovenia (Med.), Czech Republic |
|
Research institutions |
France, Italy |
France |
Table 4: Organisational data policy by type of institution
Although a
majority of the respondents have not developed data-sharing policies so far, a
small majority does expect to develop policies on data-sharing in the near
future: 9 out of 17. Seven organisations do not have this expectation: the
Austrian Academy of Science, the Royal Irish Academy, Information and
Innovation Systems at INRA (France), FWO (Belgium), the research councils EPSRC
and NERC (UK), the Slovenian Research Council, the Swedish Research Council and
the Czech network of universities and the academy CESNET.
The specific forms
of data-sharing may vary by scientific discipline or field. It is therefore
relevant to know in which fields the research organisations and academies
expect that issues of data-sharing will become most pressing. According to the
respondents in this survey, the access to, and sharing of, research data will
be an issue in all scientific and scholarly disciplines. The respondents did,
however, identify a field in which data-sharing is most urgent: the life
sciences. The humanities, on the other hand, are least expected to be
confronted with issues of data-sharing. In the classical experimental sciences
such as chemistry and physics, some respondents indicated that data-sharing
might not be such an urgent problem because existing practices and databases
may usually be sufficient to provide for the data needed. This may, however, be
quite different in new, multidisciplinary, fields (such as materials science
and nano-technology) and in fields which use large data generating instruments
(such as high energy physics and astronomy).
We also inquired
about the type of activities which were undertaken by organisations with a
policy on data-sharing, broken down by field. The answers show no relationship
between the type of policy action (from non-binding recommendations to
legislation) and the scientific field. This means that if organisations are
involved in, for example, the formulation of recommendations, they tend to
develop this for all fields for which they bear responsibility. Asked about the
type of policy action they expected for the future, "development and
implementation of regulation" was the most frequently mentioned, followed
by the formulation of "non-binding recommendations". Legislation in
countries where it does not yet exist was only expected by two respondents.
An important issue
in data-sharing is also the identification of the nature of barriers and
problems that may prevent the further development of data-sharing practices in
the sciences and humanities. The respondents were asked to identify which type
of problem they expected to encounter in the future development of their
policies on the access to, and sharing of, research data. This resulted in the
following rank order (see Table 5).
|
Type of problem |
Number of responses |
|
legal problems (among others privacy) |
9 |
|
technical problems |
9 |
|
standards |
8 |
|
institutional barriers |
3 |
|
prohibitive cost |
3 |
Table 5: Types of problems expected in data-sharing
policies
Lastly, we
inquired about the nature of the activities developed under the guidance of the
research councils and academies. This should give some insight in the type of
expertise that is, and will be, developed by the respondents. Selling data is
definitely not popular among the respondents: only 3 organisations are active
in this respect. The funding and /or management of data archives and
depositories is presently, and probably also in the near future, the most
practised type of activity that is included in the policies of the respondents
(see Table 6).
|
Type of activity |
Number of respondents |
|
Funding/managing data archives |
12 |
|
Co-operation with governmental data collecting agencies |
11 |
|
Co-operation with national archives |
9 (plus 1 which is itself an archive) |
|
Selling/buying data from commercial firms |
3 |
Table 6: Type of activities in data-sharing policies
Is there a relation between national and
organisational data policies?
The survey
results give a clear indication that there is a statistically significant
relationship between the existence of policies on issues of data-sharing and
the existence of national policies on these issues.
On the
basis of the questionnaire, it is possible to construct four different types of
data-sharing configurations. These are:
·
Type
A: respondents which have a policy on data-sharing in a country where
data-sharing is an issue at the national level
·
Type
B: respondents which have a policy on data-sharing in a country where
data-sharing is not an issue at the national level
·
Type
C: respondents which do not have a policy on data-sharing in a country where
data-sharing is an issue at the national level
·
Type
D: respondents which do not have a policy on data-sharing in a country where
data-sharing is not an issue at the national level
This
typology is basically a table showing two dimensions: national policies and
organisational policies (see Table 7).
|
|
Nat. pol.: yes |
Nat. pol.: no |
|
Org. pol.: yes |
10 |
0 |
|
Org. pol.: no |
3 |
14 |
Table 7: Correlation between
national and organisational policies on data sharing
This
relationship is statistically significant at the one promille level, which
means that the probability that this relationship is due to chance is less than
one in a thousand[5]. The total number of observations is
small, but this also holds for the whole population of institutions and
countries. The level of non-response does not affect the correlation between
data-sharing policies at the level of the nation and the level of the
institution[6].
The
correlation is also substantially significant because it is not self-evident
that initiatives in science policy at the national (or international) level
lead to related changes in research organisations and funding agencies. Science
policy is a political domain and hence relatively independent of the domain of
scientific institutions. If novel themes like data-sharing do indeed
"carry over" from the political domain to the institutional (which is
suggested by the correlation), it may underline the practical relevance of formulating
policy principles and guidelines at the national and international level in
policy documents.
Conclusions and discussion
Data-sharing is
still in its infancy as a policy issue in non-US countries. Most respondents
have not yet developed explicit policies and guidelines on data-sharing. This
is confirmed by the interest of respondents in being informed about activities
of the OECD/CSTP Working Group on Data Sharing in the future. Only 16 of the 29
respondents wish to be kept informed. Nevertheless, the majority of research
councils and academies expect that the access to, and sharing of, research data
will become an important issue in the next three years. This is underlined by
the fact that the respondents to this email survey tend to prioritise more
consequential forms of policy initiatives (such as the formulation of
regulation) above less consequential forms (such as non-binding
recommendations).
The respondents
expect that data-sharing will become an issue in all scientific and scholarly
fields. The life sciences have, however, been identified as the field in which
guidelines on data-sharing may be most urgent. The main problems respondents
expect with respect to data-sharing are technical difficulties and descriptive
standards, legal restrictions and institutional barriers. Considerations of
financial costs are not deemed so important.
Selling and buying
data is not a major activity of the respondents. This may point to an
intriguing paradox in the future. Although the life sciences are mentioned as
the area where data-sharing is most urgent, the respondents do not expect to be
very active in selling data to, or buying data from, commercial firms. As is
well-known, the life sciences have become commercialised in many ways, also
with respect to data-handling. This may become a matter for further
consideration if the trend of commercialisation affects access to research
data.
Given the spread
of existing national policies and policy documents on data-sharing over
different countries, it seems worthwhile to study the nature of these policies
more in-depth and compare them in more detail with existing regulation in other
countries. This may be of more relevance as those research organisations which
expect to undertake future action tend to emphasize binding regulation as their
priority. Identifying key problems in the development of this type of
regulation may therefore be useful.
There is a clear
relationship between the national and organisational level of policies with
respect to the access to, and sharing of, research data. This is first
indicated by the statistical correlation found in this survey between the
existence of a policy on data-sharing at the national level and the existence
of these policies at the institutional level. This may point to the intimate
relationship between national science policy and national research
organisations. It may also be related to the relative novelty of the issues of
data-sharing. New themes may perhaps "carry over" relatively easily,
which would point to an agenda-setting role of national science policy.
Secondly, the relationship is indicated by the difference of the results of
this email survey and the findings of the Web survey of data-sharing policies
and principles in the US (Wouters 2002). In the US, there exists both a
political tradition in which public access to data is seen as crucial and a set
of federal laws that regulate how research organisations and institutions
should provide access to research data and facilitate the sharing of research
data. This has created a regulatory context to which research organisations
seem to have adapted by developing explicit principles, policies and
guidelines. Outside of the US this is not (yet) the case.
Part II - Access to and sharing of research data the policy
context. A Web scan of principles and regulations in the US
The
United States is probably the largest data producer in the world. Government
agencies, scientific institutions, and commercial companies generate enormous
amounts of data on a daily basis. Due to digitization, data producing
capabilities are also increasing exponentially. There is barely a sector of
the economy that is not significantly engaged in the creation and exploitation
of digital databases, and there are many such as insurance, banking, or
direct marketing that are completely database dependent (National Research
Council 1999). Scientific and scholarly research is no exception to this
general trend. Increasingly, the creation of new knowledge is dependent upon gaining
instant access to research data as well as the capacity to store massive
amounts of generated data in a fast and reliable way. Scientific databases are
proving to be non-linear accelerators of research (Cerf 1999). In some
scientific fields a tradition of data-sharing has evolved through the daily
operation of large scientific instruments, e.g. high energy physics (CERN), or
networks of observatories, e.g. radio astronomy (Schillizzi 2000). In other
fields, however, large-scale data-sharing has been confronted with technical
and social barriers, e.g. brain research (Jennings 2000; OHBM 2001) and
genetics (Stokstad 2002).
This
has led research funding agencies and scientific societies to start developing
explicit policies and regulations to promote the economic use of large-scale
research instruments or networks of instruments. US institutions seem to be at
the forefront of this new domain of science policy. This is partly due to the
dominant role of American researchers in a number of fields, especially in
natural and life sciences (it is less so in the social sciences and
humanities). It is also related to the political tradition in the US in which
open access to government data for all citizens is seen as one of the corner
stones of democracy and the constitutional state. As a consequence, data
generated with public money (including scientific data) were freely available
to all. However, in the last five years the status quo has been challenged by
new economic, technological, and legal developments concerning (digital)
databases. Digital technologies play a paradoxical role in this development.
They may enable a radically heightened scale of data-sharing as well as
allowing for an increased level of control over data by its owner or provider.
Since shared access to data seems to have become more important than ever for
the creation of scientific knowledge,
analysis of the contradictory tensions surrounding practices of data-sharing
seems pertinent for policy. As will become clear from this study and its
comparison with the state of affairs in European data-sharing policies, the
political and legal context does affect the ways in which institutions organise
access to and sharing of research data. The question of whether clear policy principles and guidelines
have been formulated at the international and national level does matter.
However, this does not mean that the relationship between policies and rules
and the practice of data-sharing amongst scientists is straightforward
(Hilgartner 1998). For the individual researcher or research group, the policy
and regulatory context provides a set of
additional pressures which he needs to reconcile with other pressures in
his research practice, such as the complexity of the research tasks themselves,
pressure from peers and local institutional structures. Shaping the
institutional contexts of research practices is probably one of the most
effective ways of influencing the way research is being executed. For example,
by the creation of legal boundaries for research, the imposition of conditions
under which research is being funded, and the creation of infrastructures
which can be used by researchers. In the United States all three dimensions
have been implicated in attempts to promote access to, and sharing of, research
data.
The policy and technological context of access to research data
The Web documents providing the policies and
regulations on shared access to data reflect these pressures on the ways that research
is being performed. The following organisations have been included in this
study (see Appendix I for this studys methodology):
·
National Research Council NRC www.nas.edu/nrc
·
National Science Foundation NSF www.nsf.gov
·
National Institutes of Health NIH www.nih.gov
·
National Aeronautics and Space Agency NASA www.nasa.gov
·
American Assocation for the Advancement of Science
AAAS www.aaas.org
·
National Archives NARA www.nara.gov
·
National Endowment for the Humanities NEH www.neh.gov
·
Inter-University Consortium for Political and Social
Research ICPSR www.icpsr.umich.edu
·
Organisation for Human Brain Mapping (OBHM) www.humanbrainmapping.org
·
Global Change Data and Information System (GCDIS) www.globalchange.gov
·
Committee on Data for Science and Technology (CODATA) www.codata.org
The results were also compared with documents from the European Science Foundation
ESF www.esf.org. As most documents referred to ongoing debates about legal
initiatives and (partly conflicting) legislation, additional documentation on
these debates was also collected and included in the analysis.
The present state of regulation with respect to the access to
and sharing of research data has mainly been shaped by two different federal
laws in the US: the Freedom of Information Act, and the Bayh-Dole Act (see
Appendix II):
·
In 1999, the Freedom of Information Act (FOIA) was extended to
explicitly include research data. A provision was inserted in the Omnibus
Appropriations Bill (Public Law 105-277) to change federal regulations in order
to allow broader access to federally funded research data. The provision meant
that all federally funded research data could be accessed through the
mechanisms laid out in the Freedom of Information Act. The scientific community
was opposed to the proposal, arguing that it threatened to undermine the
integrity of the research process. Nevertheless, Congress adopted the extension
of the FOIA, although the White House Office of Management and Budget limited
the scope of the amendment in implementing its provisions in regulations.
Scientific institutions which are also federal agencies (such as the National
Institutes of Health) have since developed principles and policies to deal with
requests for information under the FOIA.
·
The Bayh-Dole Act of 1981 is aimed at the commercialization of
research results by granting patent rights to universities for inventions
developed with federal funds. This includes exclusive licensing. Its reach has
since been broadened, and the Act seems to have led to a substantial increase
in the number of patents filed by universities, research institutes and
individual researchers. The Bayh-Dole Act may have impeded the sharing of data
involved in the preparation of patent applications. A patent, on the other
hand, is a form of publication and does not itself limit the use of the
underlying data.
Other legal frameworks shaping shared access to
research data in the US are:
·
The
Privacy Act of 1974, which provides certain safeguards for the use of
information, maintained in a database, about individuals. These safeguards
include the right of individuals to determine what personal information is
maintained in Federal agencies' files (hard copy or electronic) and how it is
used, to have access to such records, and to correct, amend, or request
deletion of information in their records that is inaccurate, irrelevant, or
outdated.
·
The fair use exception in copyright law, which enables scientists
to use copyrighted material freely in many cases and under certain conditions.
The exception is rooted in the constitutional right of free speech under the
First Amendment. It enables the use of all factual data in a copyright
protected database as long as the creative elements in the database are not
being reproduced. However, exemption of copyright under the fair use exception
may become threatened by new forms of database protection.
·
Software
protection under patent law, which has been implemented since a law case in
1986. The US Patent Office changed its policy in the 1990s and it is now
possible to patent algorithms. As a consequence, software falls both under
patent law and under copyright protection. The algorithm and related advances
in software technology are protected by patent law (as the idea). The final
product is protected by copyright (the expression of the idea).
·
Anticircumvention
rules in the new US copyright law (the Digital Milennium Copyright Act) may, in
the near future, threaten the possibilities for scientists to use digital data
that is protected by encryption or other technical means (Samuelson 2001). The
DMCA specifically forbids the bypassing of technical measures imposed by
copyright owners to limit access to their works. It also outlaws the
manufacture or distribution of technologies designed to circumvent such
technical measures. Finally, it makes the removal of copyright management
information, such as digital watermarks, illegal. Since all digital data can be
protected with this type of encoding, the anticircumvention rules may have an
impact on access to research data in more areas than computer science alone.
The combination of detailed technological control over the use of data and
information, together with the DMCA, may have severe downstream consequences
for the reuse and redistribution of research data. However, the extent to which
this will happen is unclear.
The
regulation of shared access to data is not only shaped by legal frameworks and
federal laws, but also by the technological and economic context of the
information and data. Scientific data have predominantly become digital data
distributed through the internet and stored in digital media. Hence data have
the same economic characteristics as information goods in general (Varian
1998). Data generation is very expensive, but its distribution or copying is
cheap. Moreover, due to digitisation, the costs of data handling and storage
keep falling. Many scientific data are generated by a sole source, or in a
unique situation, which creates a natural monopoly for the data producer. Data
are now usually stored in digital databases, often with (protected) access
interfaces over the internet. The digitisation has led to a blurring of the
boundaries between data and more aggregated forms of information. This may
already happen at the level of scientific instruments when some form of
processing of the raw data takes place even before the researcher sees them. As
a consequence it is often difficult, or even impossible, to isolate data from
its informational context. Often this does not even make sense for the user.
Processed data are generally more easy to interpret and use than raw data,
which may be completely meaningless outside of the context of their generation.
This may lead to a paradox with respect to data-sharing if data processing was
based on certain field-specific assumptions and discipline-specific standards.
In those situations, the processing of data may make them less easy to use
outside of their original disciplinary context whilst at the same time making
them easier to interpret. This is one of the reasons why setting standards for
data formats in order to promote the re-use and sharing of data can be such a
daunting task, especially in interdisciplinary or hybrid contexts. In these
contexts, the economic mechanisms and institutional incentives favouring
data-sharing are often also lacking.
These
economic and technical characteristics of scientific data have been the subject
of different, and often conflicting, legal regimes and initiatives.
Traditionally, data have been free and not subject to copyright rules or of
exclusive property rights. The increased economic role of data through
digitisation triggered attempts to introduce new forms of data protection, some
of which may significantly influence data-sharing in scientific research. This
relates to key characteristics of digital information and data:
·
Digitisation
has greatly enhanced the ease of copying and distribution of large amounts of
data. This has been perceived by parts of the database industry as threatening,
especially by the music industry and producers of various directories. As a
result, in the 1990s, a lobby emerged to increase the legal protection of
databases (see below). It should be noted, however, that most databases are
protected by the copyright covering creative elements of a database. The facts
themselves are not protected (even if collecting them was labour-intensive) but
the organisation of the databases, the arrangement of the information, and the
coordination of the database are. Some elements of a database may also be
protected by patents. Most databases that are used by scientists are either in
the public domain (like all databases of the US federal government) or are
covered by copyright law.
·
The
digital environment has greatly enhanced the possibilities to prevent
unauthorised use of data with technological and legal means. Encryption
technologies enable a database producer to limit access to the database. As
most digital databases are highly dynamic entities, and their value depends on
the frequency with which databases are being updated, the nature of the
economics of databases has been transformed. Users do not buy the database
itself anymore, but increasingly licence the rights of access to the database.
This has important downstream consequences because the license (or a private
contract) may impose important constraints on the use of even the most factual
of data. This is especially important for science since much scientific
research involves the merging of data from a large number of different sources
and their redistribution in a new compilation and transformed format. Contracts,
coupled with technological constraints, can put severe limits on this type of
data use. Whether database owners will have an interest in impeding scientific
research in this way, and if so to what extent, is presently an open question.
The
discussion in the U.S. on data-sharing has also been influenced by European
legislation, which was adopted in response to pressure from parts of the
database industry. This debate has not (yet) led to new rules with respect to
access to research data. It has, however, stimulated representatives of both
the scientific community and the federal government to restate their basic
principles on acess to scientific information and data (see below). This debate
hinges upon the economic impact of digitisation:
·
In
1996, the European Union adopted a strong form of legal protection of databases
in its Directive on the Legal Protection of Databases. Since then, the
directive has been incorporated into national law in the member states and in a
number of affiliated states. The main difference from copyright law is that
the sweat of the brow of the database producer is protected, not just the
creative elements. If the investment of the database producer is substantial,
the producer has the right to prevent the extraction or reuse of any
substantial part of the database. This right pertains to downloading, copying,
printing or reproduction in any form (Hugenholtz 2001). The right holds for 15
years from the date of completion of the database. A substantial update also
renews the right. This means that dynamic databases enjoy a virtually unlimited
protection under the new database law. Even a mere substantial verification
of the database might give the producer extension of his right. The exceptions
are again far more limited than is the case in copyright law. Most traditional
ways to use copyrighted materials are prevented in the new database law, such
as journalistic freedom, quotation rights, privileges for libraries, the free
use of government information. This also holds for data. The right to use data
is far more limited than under copyright law and centres around the notion of
illustration in teaching and research. It is not yet clear to what extent the
implementation in national laws will lead to a strict or more liberal interpretation
of the law by courts. The strongest impact of the database laws on scientific
research is expected in those cases where the publication of merged and
transformed data is crucial and where researchers form the sole market for the
database. European database law does not contain provisions mandating
compulsory licensing at marginal costs to individual researchers or research
institutes (David 2001-003).
·
In
the US, a comparable debate started in 1991 after the Supreme Court ruled that
databases were not protected under sweat of the brow terms and copyright protection was limited to its creative
elements. The European directive subsequently fuelled this debate. Successive
Congresses have considered the
introduction of comparable regulation (draft database bills HR 3531, HR 2652,
HR 354 and HR 1858). One reason for this is that the European directive
contains a reciprocity provision which limits the legal protection to database
producers from those countries that have similar tight database laws. The
scientific community, in common with many other interest groups, has strongly
opposed attempts to emulate the European directive in the US and elsewhere,
since it would severely limit access to and use of data for research. A key
point in the debate is whether database producers should enjoy a novel property
right (as is the case in Europe) or rather protection against unfair
competition comparable to already existing laws against misappropriation. The
precise formulation of exceptions for scientific research is also a key point
in the debate.
·
The
debate on databases may be especially important because the role of the federal
government in the production and funding of scientific databases seems to be
changing. Private-public partnerships now play a more important role. Private
companies are becoming more important in the dissemination of government data, and a number of
data-producing activities have been outsourced by federal agencies, partly to
cut the budget. This development may lead to new database legislation having a bigger impact on the sharing of
data in scientific research.
Under U.S. federal government law and policy, publicly
funded information, including research data, should be in the public domain.
This is the basic principle informing most data-sharing rules included in this
study. It is laid down in the guidelines published by the National Institutes
of Health (NIH 2001): Most grant-related information submitted to NIH by the
applicant or grantee in the application or in the post award phase is
considered public information and is subject to possible release to individuals
or organizations outside NIH. The statutes and policies that require this
information to be made public are intended to foster an open system of
Government and accountability for governmental programs and expenditures, and,
in the case of research, to provide information about federally funded
activities. Only certain types of information that may be considered
proprietary or private information may be withheld from the public. This means
that NIH will generally release the following types of records in response to
an FOIA request:
·
Funded applications;
·
Pending and funded non-competing continuations;
·
Grant progress reports;
·
Final reports of any audit, survey, review, or
evaluation of grantee performance that have been transmitted to the grantee.
Other types of information will generally be kept
confidential. These include, amongst others, pending competing grant applications; unfunded
new and competing applications; financial information regarding a person;
information pertaining to an individual; pre-decisional opinions; evaluative
portions of site visit reports and peer review summary statements; trade
secrets; information which, if released, would adversely affect the competitive
position of the person or organization; and patent or other valuable commercial
rights. As will be clear, the exceptions are mostly based on the Privacy Act
and on the Bayh-Dole Act.
Research data may be included in either category of
research information. In the NIH Grants Policy Statement "data" is
defined as "recorded information, regardless of the form or media on which
it may be recorded, and includes writings, films, sound recordings, pictorial
reproductions, drawings, designs, or other graphic representations, procedural
manuals, forms, diagrams, work flow charts, equipment descriptions, data files,
data processing or computer programs (software), statistical records, and other
research data." NIH has developed project/programme specific guidelines
for access to research data. Whenever possible, data should be deposited in
public databases and materials in public repositories. Where appropriate
repositories do not exist or are unable to accept the data or materials,
investigators should accomodate requests to the extent possible.
Recently, NIH announced the
further extension of its policy regarding sharing research resources through a
new draft statement on data-sharing (NIH announcement 1 March 2002). The new
statement will expect and support the "timely release and sharing of final
research data from NIH-supported studies for use by other researchers".
Investigators submitting an NIH application will be required to include a plan
for data-sharing or to state why data sharing is not possible. The statement
focuses on "final research data". NIH defines this as follows:
"recorded factual material commonly accepted in the scientific community
as necessary to validate research findings". Final research data will,
therefore, not include: "laboratory notebooks, partial data sets,
preliminary analyses, drafts of scientific papers, plans for future research,
peer review reports, communications with colleagues, or physical objects such
as gels or laboratory specimens" (NIH FAQ on Data Sharing, March 1, 2002).
Public access to research data is also the basic
principle of the National Science Foundation. NSF advocates and encourages
open scientific communication (NSF Grant Proposal Guide, V, H, 1-1-2002). NSF
expects significant findings from supported research and educational activities
to be promptly submitted for publication with authorship that accurately
reflects the contributions of those involved. It expects PIs to share with
other researchers, at no more than incremental cost and within a reasonable
time, the data, samples, physical collections and other supporting materials
created or gathered in the course of the work. It also encourages grantees to
share software and inventions, once appropriate protection for them has been
secured, and otherwise act to make the innovations they embody widely useful
and usable.
NASA shall provide for the widest practicable and
appropriate dissemination" of the
STI resulting from its research effort, while precluding the inappropriate
dissemination of sensitive information.
NASA disseminates scientific information in a manner
consistent with U.S. laws and regulations, Federal information policy,
intellectual property rights, technology transfer protection requirements, and
budgetary and technological limitations. In this, NASA follows the principle
of non-discriminatory access so that all users within the same data use
category will be treated equally. Preferential treatment for U.S. government
users and affiliates will be allowed by NASA only where expressly permitted by
law. Archiving is seen as part of NASAs responsibility. NASA has developed an
elaborate set of rules covering the publication of technical reports and technical
manuals in its Guidelines for
Documentation, Approval, and Dissemination of NASA STI (valid until
September 2002). Technical publications usually include extensive data or
theoretical analysis, but they may also be compilations of significant
scientific and technical data.
In 1999, the US
government stated its basic policy principles before the House of
Representatives (Pincus 1999) discussing the Collections of Information
Antipiracy Act (H.R. 354). These include:
·
databases generated with Government funding generally
should not be placed under exclusive control, de jure or de facto, of
private parties;
·
any database misappropriation regime should provide
exceptions analogous to fair use principles of copyright law; in particular,
any effects on non-commercial research should be de minimis.
These principles are based on weighing the need to
protect database creators against the potential impact on scientific research
in particular, and the dissemination of information within the society
generally. Therefore, database protection should leave room for transformative
use of data. Facts should also be excluded from protection: The
Copyright Clause and the Copyright Act permit protection only of an authors
expression, and do not authorize protection of facts. This comports with the
First Amendment principles. Government information should be publicly
available because it is a valuable national resource. It provides the public
with knowledge of the government, society and economy past, present and future.
It is a means to ensure the accountability of government, to manage the
governments operations, to maintain the healthy performance of the economy,
and is itself a commodity in the marketplace. Pincus explicitly included
universities in the governmental domain. We believe that public universities
should fall within a broad definition of government institutions which generate
collections of information. Instead of trying to draw a distinction between
public universities and other government institutions, it might be more
appropriate to concentrate on the distinction between public research and
privately funded research at public institutions. The US government also believes that databases produced with substantial
government funding should be treated like databases of government-generated
data (unless a contrary provision has been included in the contract or grant).
The National Academy of Sciences, the National Academy
of Engineering, the Institute of Medicine and the American Association for the
Advancement of Science gave a joint statement in the same House discussion on database
protection (Lederberg 1999). Thus, freedom of inquiry, the open availability
of scientific data, and the open publication of results are cornerstones of our
research system that US law and tradition have long upheld. Hence, full and
open access to data is the basic principle for
many scientific institutions in the U.S. Lederberg, citing the Bits of
Power report (NAS 1997), defined full and open as follows: by full and open we
mean that data and information derived from publicly funded research are made
available with as few restrictions as possible, on a non-discriminatory basis,
for no more than the cost of reproduction and dissemination.
Data from the private sector should be made available
on a fair and equitable basis. This means that if commercial content
providers receive enhanced protections in their databases, that preferential
terms of access to and use of those data by researchers, educators, libraries, and
other public-interest entities, firmly rooted in our Constitution and legal
tradition, are retained and, when necessary, adapted to the digital and online
environment.
In November 2000, CODATA formulated six principles
for science in the internet era to support full and open access to data
needed for research and education. These principles are:
·
science is an investment in the public interest
·
scientific advances rely on full and open access to
data
·
a market model for access to data is unsuitable for
research and education[7]
·
publication of data is essential to scientific
research and the dissemination of knowledge
·
the interests of database owners must be balanced with
societys need for open exchange of ideas
·
legislators should take into account the impact
intellectual property laws may have on research and education.
The US Global Change Research Program initiated a Data
and Information Working Group to develop interagency data management in 1987
(DWIG 2001). The program has had full and open access as policy guidance for
federally obtained data since its inception (DWIG 1999). This means that data
and information should be available without restriction, on a
non-discriminatory basis, for no more than the cost of reproduction and
distribution (DWIG 1998). Where possible, access to data should be provided
through the World Wide Web to keep the costs as low as possible and to allow
distribution to be as wide as possible.
The National Endowment for the Humanities has been
encouraging and supporting humanities research and scholarship involving
computer technologies since the early 1970s. Although the term data-sharing as
such is not used often, a large number of NEH funded projects are in fact forms of
data-sharing, e.g. the creation of large repositories and databases of
digitised information. The same holds for projects in the area of preserving
human and cultural heritage. NEH also addresses data-sharing by funding
projects aimed at developing standards for creating and preserving digital data
for research.
The National Archives, for which making
data accessible is the very reason of its existence, states increased
data-sharing as one of the goals for the improvement of its data
administration. The Inter-University Consortium for Political and Social Research, ICPSR,
is an organisation of member institutions working
together to acquire and preserve social science data, to provide open and
equitable access to these data, and to promote effective data use. The ICPSR
promotes and facilitates research and instruction in the social sciences and
related areas by acquiring, developing, archiving, and disseminating data and
documentation for instruction and research and by conducting related
instructional programs.
Two
different motivations for promoting data-sharing emerge in this study. First,
public policy considerations. Secondly, the needs of scientific research
itself.
In the first category, the following motivations can
be distinguished:
·
the principle that the various forms of data collected
with public funds belong in the public domain
·
researchers have a special obligation to scientific
openness and accountability when the research is publicly funded
·
the obligation to abide by the law, especially the
Freedom of Information Act
·
to improve U.S. competitiveness.
In the second category, motivations are:
·
the advancement of science
·
the widespread and timely distribution of tools for further discovery
·
verification and refinement of research findings
·
the replication and secondary analyses of valuable
(and costly) data sets to address new, and quite possibly unforeseen, research
questions
·
to reduce unnecessary duplication of research
·
reduction of the need for new data collection and
social surveys.
·
economies of scale
·
to improve the productivity and cost-effectiveness of
research
·
the need for large data sets to answer research
questions that cannot otherwise be addressed
·
the application of cutting-edge technologies to data
sets by multidisciplinary research teams
·
when research tools are used only within one or a
small number of institutions, there is a great risk that fruitful avenues of
research will be neglected
·
providing access to data for new but talented
researchers
·
to improve training for graduate and undergraduate
students.
All organisations used motivations from both
categories, although
the emphasis does vary. The US government, NIH and NASA tend to emphasize first
of all the public policy considerations. The NSF, AAAS, and the NAS/NRC tend to
start with stressing the importance of science for society and the role of
shared access to research data in the creation of new scientific knowledge. All
organisations explictly acknowledge the political and legal paradigms in the US
which have full and open access to data and information as a basic tenet.
All organisations try to balance the need for sharing
data with the recognition of intellectual property rights on inventions (data
themselves are not protected under copyright or patent laws). In the US, this
means that research
organisations need to satisfy the conditions of both the Freedom of Information
Act and the Bayh-Dole Act. The NSF allows grantees to retain principal legal
rights to intellectual property developed under NSF grants to provide
incentives for development and dissemination of inventions, software and
publications that can enhance their usefulness, accessibility and upkeep. Such incentives do not, however, reduce the
responsibility that investigators and organizations have as members of the
scientific and engineering community to make results, data and collections
available to other researchers.
The NIH expects recipients of funds to maximize the
use of their research findings by making them available to the research
community and the public, and through their timely transfer to industry for
commercialization. The right of researchers to retain title to inventions made
with NIH funds comes with the corresponding obligations to promote utilization,
commercialization, and public availability of these inventions. The Bayh-Dole
Act encourages researchers to patent and license subject inventions as one
means of fulfilling these obligations. However, the NIH states, the use of
patents and exclusive licenses is not the only, nor in some cases the most
appropriate, means of implementing the Act. Where the subject invention is
useful primarily as a research tool, inappropriate licensing practices are
likely to thwart rather than promote utilization, commercialization and public
availability of the invention. The NIH stipulates that researchers should
analyse whether further research, development and private investment are needed
to realize this primary usefulness. If it is not, the goals of the Bayh-Dole
Act can be met through publication, deposit in an appropriate databank or
repository, widespread non-exclusive licensing or any other number of
dissemination techniques. Restrictive licensing of such an invention, such as
to a for-profit sponsor for exclusive internal use, is antithetical to the
goals of the Bayh-Dole Act. On the other hand, where private sector
involvement is desirable to assist with maintenance, reproduction, and/or
distribution of the tool, or because further research and development is needed
to realise the invention's usefulness as a research tool, licenses should be
crafted to fit the circumstances, with the goal of ensuring widespread and
appropriate distribution of the final tool product. The NIH explicitly
includes the possibility of exclusive licensing. The NIH also considers the
burden of patenting and licensing. Researchers are asked to take every
reasonable step to streamline the process of transferring their own research
tools freely to other academic research institutions using either no formal
agreement, a cover letter, the Simple Letter Agreement of the Uniform
Biological Materials Transfer Agreement (UBMTA), or the UBMTA itself.
The funding organisations covered in this Web scan
increasingly require explicit data-sharing plans as a condition for research
funding. These plans should cover how and where these materials will be stored at reasonable cost,
and how access will be provided to other researchers, generally at their cost. Since
2001, NSF has asked researchers to explicitly include, if appropriate, plans
for preservation, documentation, and sharing of data, samples, physical
collections and other related research products (NSF 2001).
In the case of x-ray crystallographers the NIH has a
policy that requires the placement of coordinate data into a data bank at the
time of publication. The NIH and DOE genome programs require all applicants
expecting to generate significant amounts of genome data and materials to
describe in their application how and when they plan to make such data and
materials available to the community. These plans in each application will be
reviewed in the course of peer review and by staff to assure they are
reasonable and in conformity with program philosophy. If a grant is made, the
applicants sharing plans will become a condition of the award, and
compliance will be reviewed before continuation is provided. Progress reports
will be asked to address the issue. NASA also stipulates that data-sharing
plans should be part of research plans. For example, all NASAs Earth System
Enterprise missions, projects, grant proposals shall include data management
plans. For each cooperative activity with industry, domestic or foreign, NASA
shall seek agreement on all major data management and distribution issues
during the project definition phase.
Generally, the researcher or research institution
obtaining the funding is held responsible for providing access to data. This
means that the costs for providing access to data can be included in the
research budget. The NSF has the rule that the budget may request funds for
the costs of documenting, preparing, publishing or otherwise making available
to others the findings and products of the work conducted under the grant. The
NIH prefers data sets to be put into data archives, and objects into
repositories. If this is not possible, the researchers should provide access
as much as possible. For NIH grants, the awardee is not the individual investigator but the
institution. The NSF has the same position as NIH, with the exception of some
post-doctoral fellowships. The NIH notes that this may create problems under
the Freedom of Information Act since a request to the NIH to produce data may
go to a university that no longer has an employer-employee relationship with
the investigator. Within NASA the departments are responsible. The organisation
also assumes responsibility for archiving. In general, however, long term
archiving will not be guaranteed by research groups or organisations. For this
reason, the ESF is of the opinion that national or regional discipline-based
archives should be considered where there are practical or other problems in
storing data at the institution where the research was conducted.
Different types of data may create various specific
problems if they are to be shared with other researchers or made available to
the public at large. The following relevant issues have been identified in the
documents:
·
the sharing of data as research results may meet
different obstacles
compared to those met by the sharing of data that have been used as
research resource. In a number of cases, data used as input for research may
not as easily be shared as data resulting from research. This reluctance may be
motivated, for example, by the fear that the release of raw input data could
unblind clinical trials, lead to erroneous conclusions, undermine
investigators' investments, and jeopardize their intellectual property rights,
especially in regard to non-US patents (NIH Response 1999).
·
different types of data may require different storage
facilities and access requirements. Examples are archaeological data, specimens
from physical anthropology, large-scale survey data, oral interviews with
scientists and other subjects, data generated by experimental research, and
field records of tribal ceremonies.
·
mathematical and computer models are both tools and
data. Sharing these often means that investigators must prepare fully
documented and robust versions of these models.
·
objects of research such as archaeological specimens
or fossil remains pose specific problems. In these instances data consist not
only of the objects themselves, but also of contextual information and
quantitative and qualitative descriptions of the materials. As these
physical objects do not always become the property of the investigator but
often belong to a host nation or cultural group, scientists may not control
access to them.
·
qualitative information ranging from microfilms and
other copies of very old documents, to oral interviews and video tapes,
ethnographic or linguistic field notes or recordings or transcriptions, or hand
written records of open-ended interviews, need special arrangements including privacy
protection and specification of the time at which they will be made available.
·
quantitative social and economic data sets generally
need to be placed in specialised data archives.
·
in experimental research, individuals, be they people,
animals, or objects, are subjected to preplanned conditions and their responses
tabulated in some fashion. For these data, complete information on how an
experiment was conducted and any unusual stimulus materials is important, so
that failures to replicate will not turn out to depend on one scientist's
incomplete understanding of another's procedure. In these cases, placing such
data in a formal archive may be a solution.
·
on the other hand, in experimental science, the data
are the result of experiments. Here the need, as perceived by a number of scientific
communities, is not to make the original data available, but to make available
the methods used to obtain the results. If others challenge those results, they
would try to replicate the experiment and would then publish their findings.
·
longitudinal
data sets present a special problem as the release of data early in a long term
study could affect later waves of data collection and could risk
identification of subjects (for example in medical research).
At the GRV III conference, the issue of reasonable
limits to data-sharing was raised. In this scan considerations of privacy
protection seem to dominate. A second important limitation mentioned is the
protection of the research process. The NIH states that access to research
data must occur in the context of strong protections for research
participants, protection of proprietary interests, freedom from harassment of
researchers, and confidence that the process will further research, not harm
it.
The following limitations are mentioned in the Web
documents:
·
safeguard the rights of individuals and subjects
·
the rights of individuals to determine what
information about them is maintained
·
legitimate interest of investigators, for example
materials deemed
to be confidential by a researcher until publication in a peer-reviewed journal
·
the
time needed to check the validity of results
·
the
integrity of collections
·
data
released to the public that could lead to the identification of historically
and scientifically valuable archeological sites could invite looting and
destruction
·
data
enabling the identification of the location of rare botanical species outside
the United States could lead to unwanted bioprospecting and could damage the
relationship between researchers and the host community
·
differences
between fields
·
information related to law enforcement investigations
·
national security information.
The following data and research resources are
generally excluded from the duty to provide access to them under the Freedom of
Information Act:
·
draft materials such as preliminary analyses, drafts
of scientific papers and plans for future research
·
peer reviews
·
communications among colleagues
·
physical objects (e.g., laboratory samples, audio or
video tapes)
·
pending competing grant applications
·
unfunded new and competing continuations and competing
supplemental applications
·
financial information regarding a person, such as
salary information pertaining to project personnel
·
information pertaining to an individual, the disclosure
of which would constitute a clearly unwarranted invasion of personal privacy
·
evaluative portions of site visit reports and peer
review summary statements, including priority scores
·
trade secrets and commercial, financial, and otherwise
intrinsically valuable items of information that are obtained from a person or
organization and are privileged or confidential
·
unpublished data: Premature access to data could
unblind clinical trials, lead to erroneous conclusions, undermine
investigators' investments, and jeopardize their intellectual property rights,
especially in regard to non-US patents.
Public availability and accessibility of research data
is a basic policy principle of the US organisations in this Web scan. The need
for scientific organisations to abide by the law has necessitated an explicit
and transparent set of rules and policies. This includes the availability of
research data for sharing among researchers. An important motivation for making
research data available is the principle that publicly funded research data
(both data used as resource and data resulting from research) should be
publicly available. The second set of motivations for explicit guidelines on
data-sharing results from changes in the conduct of scientific research. The
application of information and communication technologies and new imaging
technologies has accelerated the process in which sharing data and resources is
becoming crucial for research in a variety of fields. More complex
multidisciplinary research questions are also important factors driving the
process of increasing data sets and creating new types of large distributed
data sets. Researchers themselves are becoming more dependent on the increased
possibilities for data-sharing. The need to give new researchers
access to data, and the need to increase the quality of
research training, give added impetus to improved regulation
of access to research data.
As a result, plans for data-sharing are a condition
for research funding from the funding agencies in this study. Those plans are
subjected to quality control and peer review, taking into account both the
rules of the funding organisation and discipline-specific quality criteria. The
research organisation or individual investigator is responsible for enabling
access to research data. Long term archiving is an exception to this rule. This
should be the responsibility of specialised data archives and repositories.
The contrast with the outcome of the email survey of
ESF members and related organisations in Australia, Canada and Japan is
striking. The email survey showed that data-sharing is an emerging issue in
science policy. Most organisations expect to develop policies on the access to, and
sharing of, research data in the next few years. In the US this is
already firmly established. The Web documents in this study have proved that
the existence of federal laws governing the data handling processes (Privacy
Act, Freedom of Information Act and the Bayh-Dole Act) are the principal cause
of the difference between the US and Europe.
This study did not cover all of US academic research.
Neither can the extent to which the data are made available in digital form be
concluded from these policy documents. Given the wide variety of data types
involved, regulation seems to be relevant for digital data, as well as analogue
data and objects. The increased digitisation of research information will no
doubt lead to a sustained increase of digital research data. The variety of
data types necessitates not only the availability of various technical tools
and standards for data-sharing but also the development of adequate institutional
arrangements.
The policy documents indicate that research contracts
do indeed stipulate detailed agreements on data-sharing taking the specific
characteristics of the research data into account. An interesting question is
which experiences have been collected with these data-sharing plans and what
type of tools and arrangements have proved effective.
The limits to public
accessibility of data are explicitly stated in the guidelines studied. The most
important limits which are deemed reasonable arise from:
·
protection of the rights of persons and research
subjects (including privacy protection);
·
protection of intellectual property rights;
·
concerns over the integrity of the research process;
and
·
considerations of national economic and security
interests.
The precise consequences of these limits and the ways
they are addressed relate to the type of data involved. The documents give the
impression that the type of organisation (funding agency; research
organisation; scientific society; archive) also determines the balance which is
struck between conflicting needs and the way that limits to data sharing and accessibility are being
imposed. This includes the exact definition of terms (e.g. what are data), the
materials that are excluded from public scrutiny (e.g. under the Freedom of
Information Act) and the extent to which exclusive licensing is permitted. It
should be noted here that the legitimate interests of the researcher producing
the data are generally seen as part of the need to protect the integrity of the
research process. No organisation claims a semi-permanent privileged access to
the data for the data producing investigator, given that it concerns publicly
funded research.
It is nevertheless clear that the investigator is an
important party in the application of the rules on data management and the
development of
data-sharing practices. The types of tools and regulations that are most conducive
to data-sharing, as well as the effects that increased data-sharing may have on
the research process itself can only be determined in case studies and
comparative studies of data-sharing practices. This is also necessary to
determine how the guidelines and principles covered in this Web scan are
actually being applied and which experiences and best
practices have been collected. Data-sharing is not always uncontroversial in
the scientific community. In some specialties, the duty to make research data
publicly available seems to clash with established traditions and routines (or
lack thereof).
This raises the additional question of the transaction
costs of rules set by funding agencies in these cases. Moreover, the
application of general principles of data-sharing in research contract
conditions requires specialist knowledge of the types of data involved and of
the various stages in the research process. This is usually acquired in some
form of cooperation or communication with the researchers in question. In other
words, the application of the general principles and guidelines is based on,
and produces, configurations of trust relationships and practical provisions.
One of the speakers at a Council meeting of the National Institutes of Mental
Health touched upon this in response to the controversy in brain research on
data-sharing: Incentives for data-sharing need to be offered that offset
investigators loss of control over their data-bases. Usually, this is some
form of added scientific value. By sharing data, an investigator gains access
to more data or other tools. Ultimately, there has to be a procedural framework
that makes sharing sensible, efficient, thorough, and value-added. If all of
those pieces are in place, fewer external or coercive forces are needed to
convince investigators to share. Best practice cases and the study of
data-sharing practices
are both needed to shed more light on the nature of the international framework
needed for data-sharing as well as the consequences of such a framework for the
production of, and access to, scientific information.
Questionnaire (please cross the
right entry)
1. Are Access to and Sharing of Research Data currently subject of
governmental science policy in your country?
- being discussed
- formulated in policy documents
- established in legislation
2. Does your organisation have a policy on Access to and Sharing of
Research Data?
No
Yes
( go to
question 5 )
3. Do you expect Access to and Sharing of Research Data to become a policy
issue for your organisation within the next 3 years?
No
questionnaire completed
Yes
to question 4
4. In what fields of research do you expect access to research data to
become a policy issue on the agenda of your organisation?
Yes No
Natural Sciences
(incl. Earth Sciences,
Atmospheric Research)
Engineering & Technology
Life Sciences
(incl. Environmental
Research, Bio diversity)
Social Sciences
(Inc. Behavioural
Sciences)
Humanities
(incl. Archaeology and
Linguistics)
questionnaire
completed
5. Does access to research data pose problems of
technical difficulties
descriptive standards
institutional barriers
prohibitive cost
legal restrictions (privacy, IP, Nat. Security)
6. Is Access to and Sharing of Research Data subject of
a
non binding recommendations from your organisation
b. formal
regulation (guidelines, funding terms, professional codes) from your
organisation
c
national legislation?
recommendations
regulation
legislation
Natural Sciences
(incl. Earth Sciences,
Atmospheric Research)
Engineering & Technology
Life Sciences
(incl. Environmental,
Research, Bio diversity)
Social Sciences
(Inc. Behavioural Sciences)
Humanities
(incl. Archaeology and
Linguistics)
7. Could you list the names and references of the policy documents
concerned and/or the Website(s) where they can be found? (If possible, please
attach an electronic version of the document(s) to your answer)
8.
Does the policy of your organisation on Access to and Sharing of Research Data
include
a
Funding and/or managing of data archives/depositories
b
Co-operation with national (governmental) archives
c
Co-operation with governmental data collecting agencies/institutes
d
Selling data to and/or buying data from commercial firms
9. Would your organisation be interested in the (follow-up) activities
(being informed, participate in a policy workshop, participate in further
consultation) of the Working Group?
No
Yes
10.
If so, could you please give the co-ordinates of the person to contact?
Full name
Appendix 2: Respons rate to the mini-survey
ORGANISATIONS THAT
REPLIED:
Australian Research Council
Austrian Research Council
Biotechnology and Biological Sciences Research Council, UK
CEA/DSM (Physics Department), France
CESNET Association, network of universities and academies, Czech
Republic
Consiglio Nazionale
delle Ricerche, Italy
Consejo Superior de Investigaciones Cientificas (CSIC), Spain
(twice)
Danish Research Agency
Deutsche Forschungsgemeinschaft, Germany
Engineering and Physical Sciences Research Council, UK
Estonian Academy of Sciences
Fonds voor
Wetenschappelijk Onderzoek, Vlaanderen, Belgium
Hungarian Academy of Sciences
Icelandic Research Council
INFM, Italy
Information and Innovation Systems INRA, France
Irish Research Council for the Humanities and Social Sciences
Medical Research Council, UK
National Research Council of Canada
Nederlandse
Organisatie voor Wetenschappelijk Onderzoek, Netherlands
Natural Environment Research Council, UK
Norwegian Academy of Science and Letters
Research Council of Norway, PBS/STR
Royal Irish Academy
Scientific and Technical Research Council (TάBITAK), Turkey
Slovenian Academy of Sciences and Arts
Slovenian Academy of Sciences and Arts, Section Medical Sciences
Slovenian Science Foundation
Swedish Research Council
The following Web sites have been searched:
·
National Research Council NRC www.nas.edu/nrc
·
National Science Foundation NSF www.nsf.gov
·
National Institutes of Health NIH www.nih.gov
·
National Aeronautics and Space Agency NASA www.nasa.gov
·
American Assocation for the Advancement of Science
AAAS www.aaas.org
·
National Archives NARA www.nara.gov
·
National Endowment for the Humanities NEH www.neh.gov
·
Inter-University Consortium for Political and Social
Research ICPSR www.icpsr.umich.edu
·
European Science Foundation ESF www.esf.org
·
Library of Congress www.loc.gov
Every Web site has been searched twice. First, the Web
site was searched on the keywords data,
sharing, and policy. The
documents retrieved were then studied for their relevance and, if
relevant, downloaded for detailed study. After document
analysis, the Web sites were visited again for a follow-up search using the
particularities of the scientific fields at hand and/or of the Web site of the
organisation.
This turned out to be especially useful where the
practice of data-sharing was referred to in other terms than data-sharing, or
where policy statements regarding data-sharing were part of documents on other
topics.
The searches were restricted to policy documents. This means that this Web scan did not aim to
capture Web documents on the practice of
data-sharing. Some documents seemed to be midway between policy and practice. For example, pilot
projects were being discussed or research proposals aimed at both a scientific
and a policy audience. If the emphasis was on policy, these documents were
included in this Web scan.
Based on the policy principles discussed at the GRV
III conference, the retrieved Web documents were studied to answer the
following questions:
·
Is public access to data stated as a basic policy
principle?
·
What is the motivation for data-sharing rules?
·
Is data-sharing a condition for research funding?
·
Who is responsible for providing access to data?
·
Are different types of data distinguished?
·
How are issues of property rights treated?
·
Which limits to data-sharing are recognised as
reasonable?
Appendix II Excerpts from the FOIA
and the Bayh-Dole Act
The Freedom of Information Act regulates the
accessibility of information in the US. In 1999, a provision was inserted in
the Omnibus Appropriations Bill (Public Law 105-277) to change a federal
regulation in order to allow broader access to federally funded research data.
The provision, as inserted by Senator Richard Shelby (R-AL), tasks the Office
of Management and Budget (OMB) to change OMB Circular A-110 so that all federally funded research data can
be accessed through the mechanisms set forth in the Freedom of Information Act.
OMB subsequently filed a proposed revision in the Federal Register on 4 February
1999 and allowed for a 60-day public comment period before any further actions
would be taken. OMB's proposed revision reads:
The Federal Government has the right to
(1) obtain, reproduce, publish, or otherwise use the data first produced under
an award, and (2) authorize others to receive, reproduce, publish, or otherwise
use such data for Federal purposes. In addition, in response to a Freedom of
Information Act (FOIA) request for data relating to published research findings
produced under an award that were used by the Federal Government in developing
policy or rules, the Federal awarding agency shall, within a reasonable time,
obtain the requested data so that they can be made available to the public
through the procedures established under the FOIA. If the Federal awarding
agency obtains the data solely in response to a FOIA request, the agency may
charge the requester a reasonable fee equaling the full incremental cost of
obtaining the data.
OMB received over 9,000 responses to its proposed revision
with 55 percent of the respondents favoring the changes. Representatives of
scientific organisations generally argued that the proposed amendment was
anathema to the character of the research process and was not the most
appropriate way to regulate access to research data. While several efforts were
made in the 106th Congress to prevent any changes to OMB Circular A-110, none
were successful. OMB released its second proposal on August 11, 1999, in the
Federal Register. The proposal took into consideration the comments received
from the February 4 proposal and greatly narrowed the scope of the Shelby
amendment. The final revision was filed in the Federal Register on October 8,
1999.
The Bayh-Dole Act was enacted in 1980 to spur the commercialization
of research results by granting patent rights to universities for inventions
developed with federal funds. This includes exclusive licensing. The principles
of the Bayh-Dole Act were the result of years of intense and emotional debate.
The debate included questions whether exclusive licenses would lead to
monopolies and higher prices; whether taxpayers would get their fair share;
whether foreign industry would benefit unduly; and whether ownership of
inventions by a contractor is anti-competitive. Economic interests rather than
academic science interests were the driving forces for the change in US
government policy. Until the Bayh-Dole Act became effective on July 1, 1981,
the federal agencies kept tight control over intellectual property rights
resulting from funded research, premised largely on traditional expectations
rooted in the procurement process. After the passage of the Bayh-Dole Act, as
the success of the Act became quickly apparent, subsequent legislative
initiatives broadened its reach further.
Bibliography
AAAS
Letter on the Notice of Proposed Rule Making (NPRM; Feb. 4, 1999, Vol. 64, No.
23, pp. 5684-5685 of the Federal Register) to amend OMB Circular A-110 (April
1999)
Anon.
(2000), Prospect of data sharing gives brain mappers a headache, Nature, 406,
p. 445
David
d'Arcy (2001), Data hosts are vital to the Internet's future, Nua Internet
Surveys, 2001, 3 December
Bruce
Alberts (July 15, 1999), Statement of Dr. Bruce Alberts, President National
Academy of Sciences before the Subcommittee on Government Management,
Information, and Technology, Committee on Government Reform, U.S. House of
Representatives, http://www.nas.edu/nrc
Duncan M.
Brown (1997), Understanding Urban Interactions: Summary of a Research Workshop,
http://www.nsf.gov/pubs/1998/sbe981/sbe981.htm, September 30, 1997
Eric G.
Campbell and others (2002), Data Withholding in Academic Genetics. Evidence
from a national survey, Journal of the American Medical Association, 287, no.
4, pp. 473--480
John W. Carlin (October 20, 1999) Statement by John
W. Carlin, Archivist of the United States, to the Subcommittee on Government
Management, Information, and Technology of the Committee on Government Reform,
House of Representatives, Congress of the United States, http://www.nara.gov/nara/vision/testimon.html
CIRCULAR A-110 (REVISED) Grants and Agreements with
Institutions of Higher Education,
Hospitals, and Other Non-Profit Organizations (1999)
Council on
Governmental Relations, THE BAYH-DOLE ACT- A GUIDE TO THE LAW AND IMPLEMENTING
REGULATIONS (1999)
Robin
Cowan and Elad Harison (2001), Protecting the Digital Endeavour: Prospects For
Intellectual Property Rights In The Information Society, MERIT - Maastricht
Economic Research Institute on Innovation and Technology, MERIT-Infonomics
Research Memorandum series 2001-028
Robin
Cowan and Elad Harison (2001), Intellectual Property Rights In A
Knowledge-Based Economy, MERIT - Maastricht Economic Research Institute on
Innovation and Technology, MERIT-Infonomics Research Memorandum series 2001-027
Paul A.
David (2001), Digital Technologies, Research Collaborations and the Extension
of Protection for Intellectual Property in Science: Will Building 'Good Fences'
Really Make 'Good Neighbors'?, MERIT - Maastricht Economic Research Institute
on Innovation and Technology, MERIT-Infonomics Research Memorandum series
2001-004
Paul A.
David (2001), Tragedy of the Public Knowledge 'Commons'? Global Science,
Intellectual Property and the Digital Technology Boomerang, MERIT - Maastricht
Economic Research Institute on Innovation and Technology, MERIT-Infonomics
Research Memorandum series 2001-003
Ed.
(2000), A debate over fMRI data sharing, Nature Neuroscience, 3, pp. 845--846
ESF
(1999), The European Social Survey (ESS) - a research instrument for the social
sciences in Europe. Report
H. Franken (2000),
"Conference Conclusions" in: Access
to Publicly Financed Research, The Global Research Village III Conference,
Conference Report (P. Schrφder, ed.), NIWI-KNAW,
Amsterdam
The
Freedom of Information Act 5 U.S.C. § 552, As Amended By Public Law No.
104-231, 110 Stat. 3048 (1996)
David M.
Hart (2002), The "Corporatization" of Science, Science, 295, no.
5554, p. 439
Stephen
Hilgartner (1998), Data Access Policy in Genome Research, in: Arnold Thackray
(ed.) Private Science. Biotechnology and the rise of the molecular sciences,
pp. 202--218, University of Pennsylvania Press, Philadelphia
P. Bernt
Hugenholtz (2001), The New Database Right: Early Case Law from Europe, Paper
presented at Ninth Annual Conference on International IP Law and Policy,
Fordham University School of Law, New York, 19-20 April 2001
ICSU/CODATA
Ad Hoc Group on Data and Information (November 30, 2000), Scientific Data
Policy Statements,
http://www.codata.org/data_access/index.html
ICSU/CODATA
Ad Hoc Group on Data and Information (November 30, 2000), Access to Databases.
Principles for Science in the Internet Era,
http://www.codata.org/data_access/index.html
Charles
Jennings and Peter Aldhous (2000), Web discussion: Should neuroscientists share
their raw data?, Nature, 406, 25 August 2000,
http://www.nature.com/neuro/debate/
Donald
Kennedy (2001), Enclosing the Research Commons, Science, 294, no. 5550, pp.
2249
Joshua
Lederberg (18 March 1999), Hearing on the "Collections of Information Antipiracy
Act. Statement of Joshua Lederberg, President-emeritus Rockefeller University
on behalf of the National Academy of Sciences, National Academy of Engineering,
Institute of Medicine, and American Association for the Advancement of Science
before the Committee on the Judiciary U.S. House of Representatives
Anne Linn
(2000), History of Database Protection: Legal Issues of Concern to the
Scientific Community, http://www.codata.org/data_access/linn.html, March 3,
2000
Stephen M.
Maurer and Suzanne Scotchmer (1999), Database Protection: Is It Broken and
Should We Fix It?, Science, 284, no. 5417, pp. 1129--1130
Stephen M.
Maurer and P. Bernt Hugenholtz and Harlan J. Onsrud (2001), Europe's Database
Experiment, Science, 294, no. 5543, pp. 789--790
NARA,
Strategic Plan of the National Archives and Records Administration 1997-2007
NARA
Access to Records in the National Archives and Records Administration
NARA
Regulations to its Holdings
NASA,
Guidelines for Documentation, Approval, and Dissemination of NASA STI (valid
until September 2002)
NASA,
External Release Of NASA Software (NPD 2210.1)
NASA,
Management of NASA Scientific and Technical Information (STI) (NPD 2220.5E)
NASA
Procedures and Guidelines (NPG) 2200.2A, Guidelines for Documentation,
Approval, and Dissemination of NASA Scientific and Technical Information
NASA
(2001), NASA Earth Science Enterprise Statement on Data Management,
http://www.earth.nasa.gov/visions/data-policy.html}, 10 July 2001
LBA
Science Steering Committee (1998), LBA Data and Publication Policies,
http://lba-hydromet.gsfc.nasa.gov/policies/lba_data_policies.htm
NEH
(2000), Report of the Humanities, Science and Technology Working Group,
National Endowment for the Humanities
NIH
Response to Notice of Proposed Rule Making (NPRM; Feb. 4, 1999, Vol. 64, No.
23, pp. 5684-5685 of the Federal Register) to amend OMB Circular A-110
Office of
Extramural Research, National Institutes of Health (2001), NIH Grants Policy
Statement 03/01, http://grants.nih.gov/grants/policy/nihgps_2001/
Office of
Extramural Research (2002), National Institutes of Health. Frequently Asked
Questions on Data Sharing,
http://grants1.nih.gov/grants/policy/data_sharing/data_sharing_faqs.htm, March
1
NIH
Principles and Guidelines for Sharing of Biomedical Research Resources
(December 1999)
NIH-DOE
Guidelines for access to mapping and sequencing data and material resources
Working
Group on Research Tools, National Institutes of Health (1998), Report of the
National Institutes of Health (NIH) Working Group on Research Tools, National
Institutes of Health
NIH
(1999), Principles and Guidelines for recipients of NIH research grants and
contracts on obtaining and disseminating biomedical research resources: final
notice, 23 December 1999
The National
Human Genome Research Institute (2001), NIH-DOE Guidelines for Access to
Mapping and Sequencing Data and Material Resources,
http://www.nhgri.nih.gov/Grant_Info/Funding/Statements/data_release.html,
National
Advisory Mental Health Council (2000), Minutes of the 196th NAMHC Meeting, http://www.nimh.nih.gov/council/min900.cfm,
15 September 2000
National
Advisory Mental Health Council (1998), Minutes of the 188th NAMHC Meeting,
http://www.nimh.nih.gov/council/min900.cfm, February 4, 1998
NRC
Committee on National Statistics (1985), Sharing
Research Data, National Academy Press, Washington DC
Committee
on Applied and Theoretical Statistics, National Academy of Sciences/National
Research Council (1995), Massive Data Sets. Proceedings of a workshop, July 7--8,
1995,
http://books.nap.edu/html/massdata/}, 7--8 July 1995
NAS (1999), Global Ocean Science - Toward an
Integrated Approach, http://www.nap.edu
Mapping
Science Committee, Board on Earth Sciences and Resources, Commission on
Geosciences, Environment, and Resources, National Research Council (1997), The
future of spatial data and society: summary of a workshop, National Academy
Press, http://books.nap.edu/html/spa/
National
Research Council (1997), Bits of Power. Issues in Global Access to Scientific Data,
National Academy Press, Washington DC
National
Research Council (1999), A Question of Balance. Private Rights and the Public
Interest in Scientific and Technical Databases, National Academy Press,
Washington DC
Commission
on Physical Sciences, Mathematics, and Applications, National Research Council
(2000), The Digital Dilemma. Intellectual Property in the Information Age,
National Academy Press, Washington DC
NSF GRANT
POLICY MANUAL (1995), NSF
Addendum
to the NSF Grant Proposal Guide (June 2001), NSF
NSF Social
and Economic Sciences (1995), Connecting and Collaborating: Issues for the
Sciences. Report of a workshop sponsored by the NSF and held at the Walter and
Judith Munk Laboratory of the Scripps Institution of Oceanography, University of
California, San Diego, http://www.nsf.gov, June 22-24, 1995
The
Division of Behavioral and Cognitive Sciences (2001), Data Archiving Policy,
http://www.nsf.gov
NSF
(1999), Realizing the Potential of Plant Genomics: From Model Systems to the
Understanding of Diversity, http://www.nsf.gov/pubs/2001/bio011/start.htm
The
Governing Council of the Organization for Human Brain Mapping ( 2001),
Neuroimaging Databases, Science, 292, 5522, pp. 1673--1676
Jason
Owen-Smith (2002), Intellectual Property: Between the ivory tower and the
market, Science, 295, no. 5561, pp.1840
Andrew J.
Pincus (18 March 1999), Statement of Andrew J. Pincus, General Counsel, United
States Department of Commerce, before the Subcommittee on Courts and
Intellectual Property, Committee on the Judiciary U.S. House of Representatives
Pamela
Samuelson (2001), Anticircumvention Rules: Threat to Science, Science, 293, no.
5537, pp. 2028--2031
Mark
Sincell (1999), Physicists and Astronomers Prepare for a Data Flood,
Science, 286, no. 5446, pp. 1840--1841
Erik
Stokstad (2002), Data Hoarding Blocks Progress in Genetics, Science, 295, 5555,
p. 599
U.S.
Environmental Protection Agency (July 24, 1995), Information Resources Management (IRM) Policy
Manual, http://www.epa.gov/irmpoli8/, EPA Directive
Number 2100
United
States Geological Survey (USGS) (15 August 2001), U.S. Geological Survey
Manual, http://www.usgs.gov/usgs-manual/
Data and
Information Working Group, U.S. Global Change Research Program (2001),
http://www.globalchange.gov, 17 December
Thomas H.
Mace (10 March 1999), DMWG Response to OMB about Suggested FOIA Changes to
A-110, http://www.globalchange.gov,
Subcommittee
on Global Change Research (June 26, 1998), Data Management for Global Change
Research, http://www.globalchange.gov
R. Corell
(October 6, 1997), DMWG "Full and Open" Definition,
http://www.globalchange.gov
Thomas H.
Mace (August 20, 1997), DMWG Policy on Data from Federal Grants,
http://www.globalchange.gov
R. Corell
(October 30, 1996), DMWG Position on Proposed World Intellectual Property
Organization Action, http://www.globalchange.gov
D. Allan
Bromley (July 2, 1991), DMWG Global Change Data Policy Statements, http://www.globalchange.gov
Wouters, P. and P.
Schroeder, Eds. (2000). Access to
Publicly Financed research : The Global Research Village III, NIWI-KNAW,
Amsterdam
[1] This research project has been
funded by the Ministry of Education, Culture and Sciences (OC&W). I would
like to thank Peter Schrφder, Jacky Bax and Emiel Broesterhuizen (Ministry OC&W), Paul Uhlir
(NAS), Tony Mayer (ESF), Peter Arzberger (UCSD), Lisette Bros, Helga van
Gelder, Gaspard de Jong (NIWI-KNAW), and Anne Beaulieu and Andrea Scharnhorst
(Nerdi) for their comments on earlier drafts. Helga van Gelder helped collect
the data. Repke de Vries (now at the Royal Library of the Netherlands)
installed the software. I am indebted to Colin Reddy for his editorial
assistance.
[2] NIWI-KNAW, Joan Muyskenweg 25, PO Box 95110, 1090 HC Amsterdam, NL; Email paul.wouters@niwi.knaw.nl
[3] The Slovenian Academy of Sciences sent in two forms, one filled in by the medical section, the other by the central bureau. The Spanish CSIC also sent in two forms but since these were substantially identical, we have treated these as one form. The Norwegian Research Council and Academy of Sciences responded together in one form.
[4] Two institutions did not fill in this question.
[5] Chi Square = 17.04 with 1 degree of freedom, hence p <<0.001.
[6]The effect of the non-response has been calculated on the basis of the known distribution of non-responding institutions over countries. In each possible configuration, the correlation turned out to be statistically significant.
[7]Although the text of this principle seems to
include all forms of research (including private research), the context of the
document indicates that what is meant here is first and foremost public
research.