LibGuides: Data Management: C: Processing Data

Once the data is collected or the datasets are chosen, the data must be processed before it can be analyzed. Processing involves changing the data into usable information. This step includes:

checking the data for duplication, errors and incorrect or missing information, then correcting the issues.
translating the data. This means putting the data into a form that the chosen analysis program can understand or making the data understandable by a human analyst. The data may also be input into a spreadsheet or another data organization and analysis tool.
converting the data. If the data was stored as raw information, such as interviews or survey results, the information must be converted into raw data that may then be processed further
documenting the data. A ReadMe file needs to be created to document all of the information about the data (e.g., general information, data and file overviews, sharing information, etc,) and metadata needs to be assigned so that the data is findable and connections may be discovered within the data and with other datasets. An optional file may also be created to document where to find the datasets, especially if the datasets are stored with several repositories. For example, within one project, software may be stored with GitHub, non-human datasets with general and/or subject specific repositories, and human data with repositories that can handle human data requirements.

Metadata Standards and Technical Aspects

DCC Disciplinary metadata standards
FAIR Principles for Research Software (FAIR4RS)
FAIR4RS recognizes the importance of research software and for the need for it to be findable, accessible, interoperable, and reusable. See how the FAIR principles for data also apply to ever-changing research software, source code files, algorithms, and more.
FAIR Principles (OnFAIR)
FAIR Guiding Principles for data management provides guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets and data. Find out on how it improves data findability and sharing.

Documentation and Metadata

Data must be documented to be used properly by you, your colleagues, and other researchers in the future. Data documentation (also known as metadata) enables one to understand your data in detail and will enable other researchers to find, use and properly cite your data.

It is critical to begin to document your data at the very beginning of your research project, even before data collection begins; doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.

Researchers can choose among various metadata standards, often tailored to a particular file format or discipline. One such standard is DDI (the Data Documentation Initiative), designed to document numeric data files.

Following are some general guidelines for aspects of your project and data that you should document, regardless of your discipline. At minimum, store this documentation in a readme.txt file or the equivalent, together with the data. One can also reference a published article which may contain some of this information.

Title	Name of the dataset or research project that produced it
Creator	Names and addresses of the organization or people who created the data
Identifier	Number used to identify the data, even if it is just an internal project reference number
Subject	Keywords or phrases describing the subject or content of the data
Funders	Organizations or agencies who funded the research
Rights	Any known intellectual property rights held for the data
Access information	Where and how your data can be accessed by other researchers
Language	Language(s) of the intellectual content of the resource, when applicable
Dates	Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule
Location	Where the data relates to a physical location, record information about its spatial coverage
Methodology	How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook
Data processing	Along the way, record any information on how the data has been altered or processed
Sources	Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed
List of file names	List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
File Formats	Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data
File structure	Organization of the data file(s) and the layout of the variables, when applicable
Variable list	List of variables in the data files, when applicable
Code lists	Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')
Versions	Date/time stamp for each file, and use a separate ID for each version
Checksums	To test if your file has changed over time

Source: Florida Institute of Technology, Evans Library.

Creating a ReadMe.txt file

ReadMe File Best Practices

Last updated on Jan 7, 2025 5:00 PM