Challenges and opportunities in research data acquisition

July 5th, 2024

Cindy Elliott

Collection Management Unit Lead

University of Arizona

ORCID: https://orcid.org/0000-0003-2767-0218


Jim Martin

Associate Librarian

University of Arizona

ORCID: https://orcid.org/0000-0002-3891-0347

Niamh Wallace

Associate Librarian

University of Arizona

ORCID: https://orcid.org/0000-0001-5501-5259

Introduction

Academic librarians have long assisted researchers in the discovery and reuse of openly available data. Recently, as the landscape of commercially available data evolves, academic libraries are encountering new opportunities and challenges in dataset acquisition and licensing (see Foster, Rinehart, & Springs, 2019; Hogenboom & Hayslett, 2017; and Sheehan &  Hogenboom, 2017). To meet emerging research needs, the University of Arizona Libraries offers a data grant program that funds dataset (numerical, geospatial, textual, visual, or audio) purchase or subscription at researcher request. For the past three years, we’ve set aside money in our information resources budget to support this program, in which researchers can apply to request funding for data set purchase or subscription that supports their research project. Our goal was to build a data collection that supports broad use by our campus research community, and like other libraries involved in similar initiatives (see list of institutions below), we included commercially available data sets, repackaged public data, or data collected by researchers and made available for a fee.

The program has yielded useful insights into researcher needs, if not a slate of purchasable datasets. We’ll outline below why it can be so challenging to facilitate access to commercial datasets and share some recommendations based on our experience, in anticipation of a growing interest in these types of data in the future.

Our criteria for acquisition

In defining our criteria for what types of commercially available data to potentially fund, we wanted to hew to our principles guiding library resource acquisition in general. This meant that we prioritized datasets that could be licensed for campus-wide use and deposited and made discoverable in our campus research data repository. We also required that the dataset would not include restricted data, such as personally identifiable information (PII) data, HIPAA, ITAR, or other sensitive data.

Who was interested in the data grant program?

Over the course of three years, we received 14 requests in total. Most of the requests we received were from graduate students working on long-term research projects, though we did hear from faculty members interested in datasets for teaching. The home disciplines of the requesters encompassed a diverse range, including public health, sociology, retail and consumer science, economics, marketing, geography, political science, and medicine.

Barriers to acquisition

Over the course of the program, the most common barriers to data purchase or licensing we encountered were:

  • Lack of academic or institutional licensing model. Many of the vendors we spoke with did not have access or licensing models for campus-wide use. Datasets were often marketed for individual research or laboratory use.
  • Price. Campus-wide licenses, if available, were often prohibitively above the spending threshold we set.
  • Restricted data. With several datasets, there was a concern about personally identifiable information, and the labor that would be necessary to de-identify before making data available for use.

Recommendations for the data acquisition workflow

The challenges of acquiring commercial data touch every part of the acquisitions workflow, from selection, acquiring, and licensing to access, use, reuse, and preservation. Once the vendor is identified, we found it important to ascertain upfront how the data can be accessed or authorized for academic or institutional use. We recommend asking the follow questions:

  • What are the access/authentication options available for this dataset?
  • Does it have to be used on the supplier platform or can it be downloaded for manipulation in other tools?
  • What are the access options for campus wide use?
  • Can the dataset be put in a campus data repository?
  • Are there limits to using the dataset by users, downloads, or page views?

If the answers to those questions are reasonable, then the next set of questions begin around permissions. The permission controls vary from vendor to vendor for data, and can be fairly restrictive from a researcher standpoint.

  • Can the data be downloaded to an individual computer?
  • What are the use/reuse restrictions?
  • Can the researcher reuse the data for publication and describe it in the publication?
  • Are there any restrictions put on the user to dispose of the data by any timeframe?

These types of restrictions are often prohibitive to libraries who are trying to provide the broadest access possible to users.

Licensing requirements differ from vendor to institution, and it is a good idea to ask to see the terms and conditions before a data purchase or subscription and then build out extra time to discuss these terms with the vendor and find out if changes are possible. Often, the person who negotiates the terms and the person who does the close reading of the license are not in the same department, so extra time is needed. One of the common challenges with licensing data is linking to master terms that are on public websites that change at any time, and are not specific to the dataset under acquisition. Some vendors wanted to add non-disclosure clauses, or restrictive clauses about non-commercial uses and how the data is cited in a license. Some vendors are vague about where the data comes from and include language about monitoring usage patterns, which may not mirror values that the library wants to uphold. Different states have requirements and addendums that need to be included in licenses, and some are university initiatives such as clauses about accessibility and privacy. Because of all of these considerations, most dataset requests don’t make it past basic library requirements for access and authentication, use, or reuse. Out of all the requests for data purchase we received, we were able to successfully acquire two products: a lobbying registration dataset and a market research report with associated data. Both were deposited in our research data repository.

One thing we can do as librarians is to understand these challenges and set expectations with researchers and graduate students when discussing commercial data requests. The process for acquiring data can be a long one, and often the request for data can be urgent. We also recommend looking for opportunities to share costs with campus partners. One strategy we employed was to collaborate with our university’s research office to find out if individual researchers or departments on campus had licenses for datasets via grant funding.

Looking forward

As the University of Arizona Libraries continue to highlight and promote our efforts to support open scholarship and open science, the data grant program provides a necessary service by recognizing that commercial datasets continue to be needed for research purposes by members of the campus community. The request process allows for a conversation that addresses potential open alternatives as well as emphasizing the importance of a license that must allow for campus wide access and have reasonable terms. If it is determined through this interaction that there is no other resource which meets their need, and the requested data is available for a reasonable cost with an acceptable license, the library may be in a position to be able to meet their specialized need. An important aspect of the program, for those whose requests are approved, is in the spirit of open scholarship — we ask recipients to present or otherwise describe the results of their research that made use of the purchased or licensed dataset or datasets.

With the funder landscape further emphasizing requirements for FAIR data that emerges from funded research proposals, more datasets will be open and available for widespread access and reuse, potentially reducing the demand for licensed products. However, this environment also presents more open data sources for companies and individuals to harvest and repackage, potentially for commercial purposes. Going forward, academic libraries can continue to play critical roles in helping to educate their communities on the complexities of dataset acquisition, copyright, and licensing by providing similar programs designed to solicit these types of conversations and interactions. Libraries also benefit by gaining a better understanding of the constituents educational and research needs for datasets.

Other academic libraries that currently offer similar data grant programs are listed below, based on an informal web search. We included libraries that had a public webpage describing their program, a set of criteria for purchase, and a stated process for their campus communities to follow in order to submit a dataset purchase request.

References

Foster, A. K., Rinehart, A. K., & Springs, G. R. (2019). Piloting the Purchase of Research Data Sets as Collections: Navigating the Unknowns. portal: Libraries and the Academy, 19(2). https://doi.org/10.1353/pla.2019.0018

Hogenboom, K., & Hayslett, M. (2017). Pioneers in the Wild West: Managing Data Collections. portal: Libraries and the Academy, 17(2), 295-319. https://doi.org/10.1353/pla.2017.0018

Sheehan B. &  Hogenboom, K. (2017). Assessing a Patron-Driven, Library-Funded Data Purchase Program. Journal of Academic Librarianship, 43 (1), 49-56. https://doi.org/10.1016/j.acalib.2016.10.001





css.php