DataQ Logo
A collaborative platform for answering research data questions in academic libraries

What intermediate data products do I need to preserve?

Steve.VanTuyl's picture
Sharing intermediate data products is important in a number of cases. The most important consideration is whether having access to the intermediate products are necessary for others to understand and verify your research process or methodology. Related to this consideration is how difficult it is to create the intermediate products that are necessary to replicate or verify your work - if the intermediate product is critical and difficult to create, sharing should be prioritized. It may also be important to consider if any of your intermediate products might be obviously useful to others either in your research domain or in another research domain. Examples of generally useful intermediate products might be processing satellite imagery to remove common errors or simulated future weather data under climate change scenarios. There are also technical considerations for sharing intermediate products. First, if the datasests you are sharing are very large, sharing the intermediate data products may prove technically difficult. Second, if the process for creating the intermediate product requires special equipment (e.g. expensive, uncommon) or complex and hard to document processes, sharing the data itself may be preferable to sharing the methods for creating the data. That said, not sharing data because production methods that are "too complicated to document" generally are not considered best practices for data sharing. If you are not planning to share important intermediate products, it is critical that you provide sufficient documentation to allow others to replicate your intermediate products. This documentation may include textual descriptions of methods, computer code, input datasets, and other elements of the methodology that allow for replication of the products.