Historical Analog Data: Valuable Asset at Risk on Your Campus

March 18, 2021

Lois G. Hendrickson
Assistant Librarian
Curator, Wangensteen Historical Library of Biology and Medicine
University of Minnesota Libraries – Twin Cities

Kristen L. Mastel
Librarian, Outreach & Instruction Librarian
University of Minnesota Libraries – Twin Cities

Shannon L. Farrell
Associate Librarian, Natural Resources Librarian
University of Minnesota Libraries – Twin Cities

Julia A. Kelly
Librarian, Science Librarian
University of Minnesota Libraries – Twin Cities

The legacy of historical analog data is uncertain. While library services around research data, such as data repositories and data management plan consultations, focus on recently produced, machine-readable data, researchers on our campuses generated data for decades before these services existed. Largely in paper formats, this historical data includes field notes, photographs and lab notebooks, and often languishes in filing cabinets or storage closets across campus.

Several critical issues exist in the preservation and re-use of analog data. Stewardship and management issues occur when researchers leave an institution, retire, or pass away, leaving data orphaned. In turn, knowledge is lost about its context, uniqueness, quality, importance as well as the usefulness of the data, including opportunities for potential reuse. Currently, many funding agencies have mandates that dictate the long term management of digital data, but those were not in place when most analog data was produced. Institutions where that data is housed seldom apply similar management policies for historic data. Separately several disciplines have made nascent attempts to protect and preserve data making small sets of analog data available. However, there are few collective solutions regarding analog data that will work across disciplines and institutions (Gross et al. 1995). Attempting to apply FAIR (findability, access, interoperability, and reuse) principles to historical datasets has been difficult, and would require wide scale collaboration among numerous people with varying expertise (Easterday et al. 2018). 

Researchers and disciplines may look to libraries and archives to store and curate analog data, but most do not currently collect historical data. Moreover, this may not be an optimal or practical solution, depending on a researcher’s potential use of the data.  Practical issues such as needing frequent access to use the analog data or to add data to existing records, means the archives at your institution may not be an appropriate fit. Mass digitization of entire data sets has been proposed, a time and resource intensive solution that is not practical. Depositing datasets in institutional repositories requires that they be digitized, but depending on the type of data, researchers may need to rekey the data into a machine readable format in order to use it in their own research. Data repositories, either institution or discipline-based, only accept machine-readable data, requiring an upfront investment to transform historical data before it could be deposited. Even if these steps are taken, the various types of archives and repositories all have varying levels of access and discoverability. 

To get a clearer picture of the state of these issues, we looked at nearly 200 research articles where scientists utilized historical analog data. Their use of the data underscores its value and significance, and the implications for future research if steps are not taken to find solutions to preserve it and make it available. We found examples of researchers using older data in their work in climate science, geology, mammalogy, forestry, fisheries, astronomy, ecology, and other fields (Sharma et al. 2016, Vearncombe et al. 2107, Myers et al. 2009, Thompson et al. 2013, Novaglio et al. 2020, Pagnotta et al. 2009, Clavero et al. 2013). Formats of the data included field notebooks, tally sheets, reports, photographs, lab books, ship logs, and the tags associated with herbarium specimens. This investigation revealed a gap in how researchers located this historical data; they do not discuss how they knew the data existed, how they located or discovered it, how they gained access to it, or how another researcher could find and use it. Some researchers did mention that they worked with a museum or center that specializes in the area that they were investigating. To facilitate access and use of historical datasets, researchers need to provide proper attribution.

Our experience in helping researchers with their analog data at the University of Minnesota dates back to 2015, when librarians who had fostered a relationship with the Horticulture department were invited to assist in finding an appropriate way to preserve 100 years of potato breeding data (97 volumes). Subsequently, we have worked with various researchers on campus to organize, describe and make legacy analog data more accessible. We helped a retired scientist (Ph.D. Ecology) digitize all of the data, in numerous media formats, that he collected during his 1970s dissertation research on the threatened Mountain Plover, and uploaded it into our institutional repository (Graul 1973). The process involved physically organizing the data and providing it with appropriate description and metadata prior to scanning it. On a larger scale, we undertook a project to inventory, describe, and digitize over 100 years of fruit breeding data from our Horticultural Research Center, comprising over 75 linear feet. As with the prior project, we provided metadata about each set of data, created readme files, and deposited the data sets into our institutional repository (Farrell et al. 2019a).  Recently, we have been working with historical data collected at one of the biological field stations, both student papers and bird censuses. We received funding to create and compile metadata into spreadsheets to facilitate reuse of the original data.  

These projects led us to formally investigate the existence, use, and status of historical analog data on a broader scale on our campus. We conducted a survey and interviews with researchers in the life sciences, and discovered that the researchers value this data, continue to use it, and sometimes still add to it (Farrell et al. 2019b, Farrell et al. 2020). Most would share it with others if asked and nearly all worry about its fate once they retire. Some are holding data that was passed to them when another researcher retired. 

Our work to organize, describe and make legacy analog data more accessible, coupled with the results from our survey of the literature, has underscored the importance of historical analog data, as well as its current peril (Farrell et al. 2020). Our survey of analog data on our campus demonstrated an ongoing need for these kinds of projects and collaboration between librarians and researchers. We have undertaken one-time initiatives for individual projects, but this process is not easily scalable. Multiple participants need to collaborate to provide context and explanations about variables, and to participate in the time consuming process to individually review each page of data to provide detailed metadata.  Additional resources are frequently needed; in our projects, we hired additional staff to complete some of this work using grant funding. With each new project, we learn more about the time required to undertake the next one, and what may be required to help faculty aid us in the process of organization and description. 

In addition to assisting individual researchers with their data, it is essential to look at the issues and solutions concerning historical data from an institutional, state, and national perspective. Earlier efforts by individual scientific communities to obtain grant funding to address preservation of historical data have had very limited success. Small-scale projects (involving an individual study, or individual researchers, or labs, or centers) are often not scalable, mass digitization is not financially feasible, and depositing records in archives is often not practical in terms of ease of reuse. If the ultimate goal is to preserve these valuable assets and increase discoverability, accessibility and reuse, librarians and scientific researchers should collaborate to find solutions. Together, we can all leverage our existing relationships to have conversations with individual researchers, campus leaders, and professional associations that could reveal the range of needs that exist. Although librarians can now build on their experience with electronic data management and leverage their existing relationships with researchers to start preserving analog data in their institution, we need broader solutions. 

Historical data is used and valued by current researchers. As a profession that focuses on discovery, access, and preservation, we ask librarians to consider what we might do to address historical data in relation to these values and what will be lost if we do not act.

References 

Clavero, Miguel, and Miguel Delibes. “Using historical accounts to set conservation baselines: the case of Lynx species in Spain.” Biodiversity and Conservation 22, no. 8 (2013): 1691-1702.

Easterday, Kelly, Tim Paulson, Proxima DasMohapatra, Peter Alagona, Shane Feirer, & Maggi Kelly. From the Field to the Cloud: A Review of Three Approaches to Sharing Historical Data From Field Stations Using Principles From Data Science. Frontiers in Environmental Science, 6, (2018): 88. https://doi.org/10.3389/fenvs.2018.00088

Farrell, Shannon L., Lois G. Hendrickson, Kristen L. Mastel, Katherine Adina Allen, and Julia A. Kelly. “Resurfacing Historical Scientific Data: A Case Study Involving Fruit Breeding Data.” Journal of eScience Librarianship 8, no. 2 (2019a): 1-13.

Farrell, Shannon L., Julia A. Kelly, and Kristen L. Mastel. “Field Notebooks and Tally Sheets.” The Proceedings of the ACRL 2019 Conference, (2019b) p. 283-288.

Farrell, Shannon L., Lois G. Hendrickson, Kristen L. Mastel, and Julia A. Kelly. “Historical Scientific Analog Data: Life Sciences Faculty’s Perspectives on Management, Reuse and Preservation.” Data Science Journal 19, no. 51 (2020): 1-10. https://doi.org/10.5334/dsj-2020-051

Graul, Walter Dale. Breeding adaptations of the mountain plover: (Charadrius montanus). (1973). Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/169740

Gross, Katherine L., and Catherinne E. Pake. Final Report of the Ecological Society of America Committee on the Future of Long-term Ecological Data (FLED): Text of the report. Vol. 1. Ecological Society of America (1995).

Myers, Philip, Barbara L. Lundrigan, Susan M.G. Hoffman, Allison Poor Haraminac, and Stephanie H. Seto. “Climate‐induced changes in the small mammal communities of the Northern Great Lakes Region.” Global Change Biology 15, no. 6 (2009): 1434-1454.

Novaglio, Camilla, Anthony D.M. Smith, Stewart Frusher, & Francesco Ferretti. Identifying historical baseline at the onset of exploitation to improve understanding of fishing impacts. Aquatic Conservation-Marine and Freshwater Ecosystems, (2020): 30(3), 475–485. https://doi.org/10.1002/aqc.3264 

Pagnotta, Ashley, Bradley E. Schaefer, Limin Xiao, Andrew C. Collazzi, & Peter Kroll. Discovery of a second nova eruption of v2487 ophiuchi. The Astronomical Journal, 138 no. 5 (2009): 1230–1234. https://doi.org/10.1088/0004-6256/138/5/1230

Sharma, Sapna, John J. Magnuson, Ryan D. Batt, Luke A. Winslow, Johanna Korhonen, & Yasuyuki Aono. Direct observations of ice seasonality reveal changes in climate over the past 320–570 years. Scientific Reports, 6, (2016): 25061. https://doi.org/10.1038/srep25061

Thompson, Jonathan R., Dunbar N. Carpenter, Charles V. Cogbill, & David R. Foster. Four Centuries of Change in Northeastern United States Forests. Plos One, 8 no. 9, (2013): e72540. https://doi.org/10.1371/journal.pone.0072540

Vearncombe, Julian, Angela Riganti, David Isles, & Sian Bright. Data upcycling. Ore Geology Reviews, 89 (2017):  887–893.