Are there best practices for what data need to be preserved for the long term, and which can be discarded?

Data should be saved following two major criteria: reuseability and reproducibility. Reuseability is the ability of a data set to be repurposed in easily intelligable formats and contexts. Reproducibility requires saving of data and information that can reproduce major research findings. Curating for a community of interest alone will lessen the benefits of data sharing, but sharing all data is unlikely to be a wise use of time. You don't have to advise researchers to save everthing, but to focus on what enables reuseability and reproducibility. Also check with the specific funder or your organization. Some funders had definitions to what they mean by "data" and what they expect should be shared. Organizations (insitutions) may have policies on how long "research data" from their researchers need to be saved.