Submission checklist for datasets in eCommons
Confirm eCommons is a good home for your data
- Review the eCommons submission policies.
- Contact us to discuss options if your file sizes are larger than 5GB, or your entire dataset larger than 50GB.
- Data in eCommons is open and public. We can provide a DOI before your dataset is published; once published, content may be versioned, but not withdrawn or modified.
- Data must be non-restricted and not contain any private, confidential, or other legally protected information (e.g., personal identifiable information).
Initiate Data Curation
Data curation ensures that datasets are complete, well-described, and in a format and structure that best facilitates long-term access, discovery, and reuse. Curation services are available to all Cornell researchers regardless of the repository chosen. The process is highly dependent on the complexity of the data set, the extent of curation needs, and the researcher's timeline.
- Contact us and a curator will respond within 1-5 business days.
- We can generate a DOI within 1-5 business days.
- The full process from curatorial review through data publication in eCommons generally takes 1-2 weeks.
- Options for review include:
- Review and then publish (recommended): Curators will review your submission before it is published. A draft DOI can be requested before data is reviewed. The dataset is not published until curation is complete and the submitter is ready for the files to be public.
- Publish then review: Data/code is self-submitted to the repository and is immediately public. A working DOI can be requested after you complete your submission. Note, because curatorial review occurs after publication, any changes or updates may result in a versioned DOI.
- Learn more about curation services.
Prepare your dataset for submission
- Organize your data into logical structures. Group your files into meaningful datasets. This may be related to your processing or analysis methods, based on figures in a related publication, by experiment or treatment, or using some temporal or spatial key. If you bundle files in a tarred or zipped folder, keep as few levels of hierarchy as possible.
- Use descriptive filenames, with no spaces or special characters. Remember that once downloaded, files won’t necessarily reside in a folder structure that is the same as yours and may be joined by files that you did not create. Example: DoeEtal_GRL2021_precipdata.csv is better than precipitation.csv.
- Ensure current and future usability. Make sure your files meet web accessibility requirements. Use file formats that support long-term preservation and are either 1) open, non-proprietary, and standard or 2) commonly used in your field. If your data is dependent on proprietary software formats, the eCommons curators will work with you to create a preservation copy of your data.
For spreadsheets:
- If using column headers with tabular data, don’t include special characters (eg. µ, σ, °, etc), spaces or start them with a number. Include units in the column headers if possible.
- Don’t include special formatting (colors, merged cells etc.), or images, and put just one table or set of data per tab. Don’t include images, or charts in your spreadsheets; save these as separate files.
- Save each tab as a separate CSV file. If your spreadsheet contains calculations or other features that need to be preserved, include the spreadsheet as an XML formatted Excel book (.xlsx), in addition to the CSV files.
For code:
- Include a README style document with author contact information,directory structure and file contents, information about software environment, version and any dependencies or packages needed for your code to run.
- Comment your code in a way that makes it easy for others to understand what you did.
- Include in your comments (and in your documentation), a clear statement about licensing and re-use. Refer to the Open Source Code website for options.
- Make links/paths to other files or directories absolute, not relative.
- If datafiles are needed for your code to run, provide them separately, and provide a link to them in your documentation.
Prepare your documentation
Describe your data.
Create documentation about what data is included in your dataset, how it is structured, and any special instructions for understanding or using it. This plain text document should give context to your data and ensure that future users will be able to interpret the files (i.e., README files, data dictionaries, codebooks, protocols, data guides, etc.). Include:
- Descriptions of any acronyms or abbreviations used (e.g. column headings, variable names, units, etc.)
- Methodology used to collect and analyze the data, including hardware and software versions, instrument settings, protocols
- Relevant contextual details and descriptions of what is found in each file
- File naming conventions, the purpose of the files and folders, how they relate to one another, etc.
- Citations to journal articles based on the data
- Explanation of file-naming conventions
- Names and contact information of contributors
- Use our guide to writing “readme” style metadata and README template.
Provide complete metadata for your submission, including the following information:
- Title: When including data underlying a publication - Data from: Title of Publication (Article, Monograph, Report, etc.); Example: "Data and scripts from: Clustering and assembly dynamics of a one-dimensional microphase former.”
- Authors: Full names, middle initials optional; in the order you want them displayed. Authors may be the same as article authors or different. Include ORCIDs whenever possible.
- Abstract. Clearly and simply describe the dataset. Example: “These files contain data along with associated output from instrumentation supporting all results reported in [lead author et. al. article title]. In [lead author et al.] we found: [article abstract].”
- Links to any related publication(s), datasets, or code. Provide DOIs or other permanent identifiers whenever possible.
- Subject keywords: Include relevant subject keywords that don’t already appear in title or abstract.
- Sponsorship information. Provide names and numbers of supporting grants, as applicable.
Review data permissions and rights
- Ensure that you have the right to share the data: Make sure that you have all necessary rights to deposit the data into eCommons. If other individuals maintain rights to the data, you must obtain permission from them to deposit your dataset. Review the eCommons deposit license.
- Address any privacy or confidentiality issues that may exist: Ensure that you have removed any data that could be used to identify subjects of your research. Before sharing human subjects data, de-identify or remove both direct and indirect identifiers that may pose a disclosure risk.
- Consider using an open license to share your data: Placing a license on your data makes clear your expectations around reuse. We highly recommend using a CC0 waiver to encourage data reuse and expect downstream users of data to follow academic norms of proper attribution and data citation. eCommons offers Creative Commons (CC) licenses, but can accommodate others as desired. Learn more about IP and data.
Submit your data
- Learn more about Data Curation Services from the Cornell Data Services.
- Contact eCommons to get the process started! A curator will respond within 1-5 business days.