Love Your Data
Hopefully you will find, use, and produce lots of research data during the years that you spend as a graduate student at Cornell.
To make the most of your projects, treat your research data with love and respect:
- Organize it.
- Back it up.
- Document it.
Organize it: directory and file naming
File organization and naming conventions are often unique to the lab and can be highly personalized. The important thing is to be consistent and to write the conventions down. Spending a little time on file management strategies early in the project planning process can save lots of time (and headaches) later. After determining conventions for file naming and organization, document and share them with collaborators, faculty advisors, or anyone else who may need access to the data.
The File Management best practices web page maintained by Cornell Data Services (CDS) has great examples of:
- Directory structure naming conventions
- File naming conventions
Back it up
A recommended practice is to keep at least three copies of your research data:
1) “here” – a local copy on your laptop or desktop, where the files were created or collected,
2) “near” – an external copy on a different media type than the original and
3) “far” – an external copy in a geographically different location, such as a cloud storage service.
This is also called the Rule of Three: THREE copies, on at least TWO different media types, with ONE copy in an entirely different location. (i.e., not in the same building, or, depending on your situation and needs, the same part of the country. This third copy would be invaluable in case of environmental risks, such as damage due to fire or water.)
Document it
It is important to describe your data with sufficient detail so that users can:
- reconstruct the context
- evaluate whether they are fit for purpose
- further analyze and reuse appropriately
You may need to describe several facets of your data, including:
- overall bibliographic information about the dataset (e.g. title, author, related publications)
- types of files used (e.g., fits, xlsx, csv, txt, xdr, png, etc.)
- key descriptive information about the experiment, (e.g. sampling or measurement methods, software used for analysis, any processing or transformations performed)
Commonly used data formats may be available in your field that help capture and structure relevant metadata. When possible, structure your metadata using an appropriate, agreed-upon metadata standard format. When no appropriate metadata standard exists, you may consider composing a “readme” style metadata document.
For more information visit the metadata best practices page of the Cornell Data Services (CDS).