Skip to Main Content
We are working to upgrade the research experience by making ongoing improvements to our Research Guides.
You may encounter changes in the look and feel of the Research Guides website along with structural changes to our existing guides. If you have any questions or concerns about this process please let us know.

Data Management

NYU Health Sciences Library (2012).  Data sharing and management snafu in 3 short acts [Video].  https://youtu.be/66oNv_DJuPc?si=GTMhs1TKJCQCHUm0

As the video shows, good Data Management practices and a solid Data Management Plan (DMP) are vital to reproducible research and to supporting new research. A researcher undertakes data management planning before they begin their project by creating a workflow of how data will be gathered, stored, archived, and eventually weeded.  The DMP lays out the plan clearly and describes the data that will be collected, the software or technology used for its analysis, how it will be gathered, and who has responsibility for its stewardship. 

The Library has an excellent collection of resources and answers to questions on planning your research project, 

Templates and Examples of DMPs

Templates


Examples of Data Management Plans

Creating a Readme.txt file

The Readme file is used to document all of the contents and structures of your datasets. The file describes how the data was collected, how it was processed, how it was analyzed, where it is stored, if any software is required to access the data within the datasets, and any other relevant information so that all of your data and processes can be recreated and understood by someone who has no prior knowledge of what you did.

The best practice is to create a Readme file to accompany each dataset and include information about the Readme files along with information about the datasets in your Data Management Plan. You may also want to create a Master Readme file if you have more than one dataset that you can append to your Data Management Plan.

Components of a Readme File

  • Title of the dataset
  • Name, Institution and Contact information for the Principal Investigator, the Data Manager, and any other team members that are responsible for the dataset
  • File Name Structure
  • File Formats
  • Column headings for data tables
  • Definitions of acronyms used, jargon used, and any other unclear terms used with the dataset

Data Management Checklist

The following questions can help you think about how you will manage your data. The answers that you note down will be useful for developing the content of a quality data management plan.

Data Production
  • What type(s) of data and datasets will be produced? Will it be video data, traditional numerical data, electronic lab notebooks, software, other kinds of datasets?
  • Will the data include human data? Will deidentification and/or anonymization be required?
  • What file format(s) will the data be saved as? Are those file formats proprietary? Will they degrade?
  • Will the data be reproducible?
  • Do you need tools or software to create/process/visualize the data?
Data Size
  • How much data will be gathered, and at what growth rate?
  • How often will the data change?
Data Transfer
  • How will the datasets be moved from local storage to long-term storage or from lab servers to other types of storage?
Data Usage
  • Who will potentially be using your data, both now and later?
Data Retention
  • How long should it be retained? (e.g., 3-5 years, 10-20 years, permanently).
  • Does your institution have a data retention policy?
  • What is your long-term plan for your data, especially once the research is concluded?
Privacy and Security
  • Does you data have any special privacy or security requirements?  (e.g., human data, personal data, high-security data are all restricted types of data).
Data Sharing
  • Any sharing requirements? (e.g., funder data sharing policy, federal requirements such as the NIH guidelines).
  • Have you chosen a repository in which to archive your data?
  • If your data is sensitive (e.g. human data, personal data), can the repository properly handle that data?
Data Management Plan
  • Does your funding agency require a data management plan in the grant proposal?
Costs
  • Does you need to include the following costs in your Data Mangement Plan:
    • Library Data Mangement assistance, up to including an embedded Data Mangement Librarian in your research team
    • Repository fees (e.g. uploading your data, long term curation)
    • Anonymization and deidentification fees
Data Documentation
  • How will you be documenting your data and project?
  • What directory and file naming convention will be used?
  • What project and data identifiers will be assigned?
  • Is there a schema, ontological, or other metadata standard in your field for sharing data with others?
  • Do you have a proper README file to explain all of your datasets, codes, codebooks, and other files?
  • Are all abbreviations, terms and labels defined so that future researchers can identify all parts of your data?
  • Do you have a file that documents all of the repositories and other places where your datasets and associated files are stored, including any needed software to access the datasets and files?
  • Is everything documented clearly enough that a future researcher, with no knowledge of your work, would be able to duplicate your work, with all of the same processes, variables, and constraints, and get the same results (within an acceptable margin of error) - AND be able to easily explain any differences in results?
Storage and Backup
  • What are the strategies for storage and backup of the data?
  • Are you aware of support backups?
  • Which repositories will you use for your data? Can they handle the type of datasets that you need stored?
  • Are you using one repository or several (e.g., Dryad, Github, Vivli, etc.)
Training
  • Will the team need training in data management best practices, working with metadata, making the datasets sharable and reproducible, or other data management topics?
Publication
  • When and where will the work be published?
Responsibility
  • Who in the research group will be responsible for data management?
  • Who controls the data (PI, student, lab, institution, funder)?

 

Source:  Florida Institute of Technology, Evans Library; Used with permission.

Last updated on Apr 4, 2025 11:19 AM