Reviewing Applications to Access Secure Social Science Data

June 17th, 2024

Matthew Hutchinson

Data Curation Specialist

GSB Library, Stanford University

The FAIR principles for scientific data management are often considered to be the go-to standard for the stewardship of research data assets. While there is little dispute as to the ideals the principles seek to espouse, they were created with a focus on scientific data. When applied to certain social science resources the FAIR standard often proves to be unobtainable.  In the case of the academic study of business, many datasets are purchased or licensed from commercial sources or acquired through a personal connection between the researcher and the industry being studied. One of the most basic conditions of this type of licensing is that the data not be shared beyond the personnel covered by the terms of the agreement; commercial data vendors wish to sell access to their data and have little interest in the ideas of Findability, Accessibility, Interoperability, or Reuse. In addition to limiting access to only the institution that has paid for the license, the vendor may also require that only a certain school, institute or faculty member be permitted to use the data in research. Some datasets cannot be used by students while others may be used only by PhD students with faculty sponsorship. Data stewards must also be mindful of privacy concerns for data subjects, security of data storage technologies, and any regulatory restrictions governing how and where the data can be used. If an ideal FAIR dataset is publicly available with high quality metadata and a clear taxonomy,  then much of the commercial data used in the social sciences does not meet this standard.

The challenge for data stewards, librarians, and others managing data assets with complex restrictions becomes how to provide the broadest access possible to resources while navigating the terms of data licensing agreements? There are a wide range of possible terms governing the use of the data combined with a wide variety of ways an applicant can be affiliated with the University. There is no ‘one size fits all’ solution that can cover every possible permutation and variation of commercial licenses. In response to this challenge, the Library at Stanford University’s Graduate School of Business has developed the process described below to evaluate requests for data access. Each application is evaluated across several dimensions with a staff member assigned to each step of the process. No one individual can effectively review every possible concern associated with licensing compliance and data privacy. The GSB Library breaks down the review into its component parts and assigns each part to a staff expert. A Coordinating Data Steward takes ownership of the application and moves it from person to person and team to team, documenting the decision at each stage. Every application passes through the following stages:

1. Received

There are multiple entry points to the data access pipeline. Applicants can complete a web form on the libraries’ website, a webform on the Libraries’ data hosting platform or send a direct email to a member of the library staff. All three methods are received by the Coordinating Data Steward who then creates a ‘ticket’ for the request in the Libraries’ Customer Relationship Management software.

2. Initial Response Sent

Once the ticket is created,the Coordinating Data Steward reaches out to the applicant to gather additional information. This typically includes asking the applicant for a short description of the research project for which the data will be used. In the case of student applicants, the Data Steward will ask whether the project is being led by a faculty member or the student themselves. It may also be important at this stage to ask the applicant where they intend to store the data and where they expect to perform their analysis. 

3. Research Data Steward Review

Next, the Coordinating Data Steward summarizes the information gathered and presents it to the Research Data Steward. This role is usually filled by a research librarian who has a good understanding of the dataset and its suitability to meet the needs of the applicant’s research question. At this point, it may be necessary to ask further questions of the applicant or make arrangements for a meeting to clarify the scope and the design of the project. The goal is to make sure the dataset requested is the best available resource for the applicant. This discussion can also serve as an opportunity to explain to applicants any restrictions or limits on the use of the data derived from the vendor license.

4. Legal Review

After the Research Data Steward confirms the dataset the applicant requires, the request is passed on to the library’s legal team. This group negotiates contracts with data vendors and understands the terms of each license agreement and how it pertains to each data asset. The goal at this stage is to review the applicant’s research plan and their affiliation with the University and confirm the intended use is in compliance with the dataset’s governing contract. For example, a PhD student may ask for access to a dataset with the intention of downloading a portion to a personal device for analysis. The legal team would review this request to determine whether students are permitted to access the data and whether subsets can be stored outside of university controlled systems.

5. Discussion by the Research Data Committee (Optional)

In particularly complex cases, the Coordinating Data Steward may decide to present the application to the Research Data Committee. This is a group that includes the Research Data Steward, the legal team, the Research Computing group and the Library’s senior leadership. The goal is for the committee to collectively reach a decision about how to proceed with the request. This step is used infrequently in the case of extremely sensitive data or large groups of researchers with a wide variety of university affiliations.

6. Signing a Data Use Agreement

The applicant is then asked to sign an agreement that describes the restrictions on the use of the data required by the data vendor’s contract. This document serves as a reference for the applicant outlining exactly what they can and cannot do with the data once they have access. It is also used by library personnel to record exactly when someone was given access and how the person is affiliated with the University. This information can be vital for both internal metrics as well as tracking which users had access to which data on which date in the case of a potential contract violation.

7. Data Engineering

Once an applicant’s request has been approved and they have agreed to the terms of the contract, the application is passed to the Data Engineering group to enable the applicant’s account. The complexity of this step depends on the system in which the data is stored. This stage can also include a brief training to help applicants learn how to use the storage and/or analytics platform. 

In an ideal world, all data acquired for research would be open and freely available; once a university acquires a dataset it would be able to share and publish as much of data as needed to support the work of the institution’s researchers. Unfortunately, for business research this is rarely the case and University administrators must navigate a labyrinth of contractual agreements to maintain access to data and relationships with data vendors. This article shares one process developed by a University library in an attempt to review each proposed use of a research dataset. The author invites the reader to consider their own workflow and compare it to the one presented here. By comparing workflows organizations can learn from one another and improve the user experience while maintaining security.