Recommended File Formats
eCommons can accept many electronic file formats. As stated in the eCommons Preservation Support Policy, the University Library is committed to preserving the binary form of the digital object of content deposited in eCommons. As resources permit, the Library will also take further measures to preserve as much functionality ("look and feel") of the original content as possible.
The long-term preservation of the complete and original functionality of certain file formats, however, may not be practical or possible. Research and experience has shown that the likelihood of successful long-term preservation of content is much higher when file formats possess the following characteristics:
- complete and open documentation
- platform-independence
- non-proprietary (vendor-independent)
- no "lossy" or proprietary compression
- no embedded files, programs or scripts
- no full or partial encryption
- no password protection
Recommended file formats (table below). The formats in the second column ("High") exhibit the characteristics above and thus have a high probability of full preservation. Those in the right-hand column have a low probability of being fully preserved over time. Those formats in the middle are preferred over their counterparts in the right-hand column, but assurance of their long-term preservation is not as high as the left-hand column.
We recommend that those depositing content in eCommons use formats in the left-hand column if at all possible, and consider methods for converting files with low probability to formats with higher probability.
For help in accessing where your digital content falls within this table, or consulting about strategies for converting files from one format to another, please contact the Library's Digital Consulting and Production Services or the Cornell Data Services.
An additional note about PDF (Portable Document Format)
PDF is a good file format choice in terms of preservation, with PDF/A being the best option. Do not embed media files in a PDF, as this can significantly increase the size of the file and make it difficult to download, access and preserve. Do not encrypt or lock a PDF file as this will make it impossible to perform optical character recognition (OCR) in order to create fully searchable text.
File formats for digital content: Probability for full long-term preservation
Content type | High | Medium | Low |
---|---|---|---|
Text | • Plain text (encoding: USASCII, UTF-8, UTF-16 with BOM) • XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema) • PDF/A-1 (ISO 19005-1) (*.pdf) |
• Cascading Style Sheets (*.css) • DTD (*.dtd) • Plain text (ISO 8859-1 encoding • PDF (*.pdf) (embedded fonts) • Rich Text Format 1.x (*.rtf) • HTML (include a DOCTYPE declaration) • SGML (*.sgml) • Open Office (*.sxw/*.odt) • OOXML (ISO/IEC DIS 29500) (*.docx) • Microsoft Word 2007 or newer (*.docx) |
• PDF (*.pdf) (encrypted) • Microsoft Word 2003 or older (*.doc) • WordPerfect (*.wpd) • DVI (*.dvi) • All other text formats not listed |
Raster image | • TIFF (uncompressed) • JPEG2000 (lossless) (*.jp2) • PNG (*.png) |
• BMP (*.bmp) • JPEG/JFIF (*.jpg) • JPEG2000 (lossy) (*.jp2) • TIFF (compressed) • GIF (*.gif) • Digital Negative DNG (*.dng) |
• MrSID (*.sid) • TIFF (in Planar format) • FlashPix (*.fpx) • PhotoShop (*.psd) • RAW • JPEG 2000 Part 2 (*.jpf, *.jpx) • All other raster image formats not listed |
Vector graphics | • SVG (no Java script binding) (*.svg) | • Computer Graphic Metafile (CGM, WebCGM) (*.cgm) | • Encapsulated Postscript (EPS) • Macromedia Flash (*.swf) • All other vector image formats not listed |
Audio | • AIFF (96kHz 16bit PCM) (*.aif, *.aiff) • WAV (96kHz 24bit PCM) (*.wav) |
• SUN Audio (uncompressed) (*.au) • Standard MIDI (*.mid, *.midi) • Ogg Vorbis (*.ogg) • Free Lossless Audio Codec (*.flac) • Advance Audio Coding (*.mp4, *.m4a, *.aac) • MP3 (MPEG-1/2, Layer 3) (*.mp3) |
• AIFC (compressed) (*.aifc) • NeXT SND (*.snd) • RealNetworks 'Real Audio' (*.ra, *.rm, *.ram) • Windows Media Audio (*.wma) • Protected AAC (*.m4p) • WAV (compressed) (*.wav) • All other audio formats not listed |
Video | • Motion JPEG 2000 (ISO/IEC 15444-4)??*.mj2) • AVI (uncompressed/native, motion JPEG) (*.avi) • QuickTime Movie (uncompressed/native, motion JPEG) (*.mov) |
• Ogg Theora (*.ogg) • MPEG-1, MPEG-2 (*.mpg, *.mpeg, wrapped in AVI, MOV) • MPEG-4 (H.263, H.264) (*.mp4, wrapped in AVI, MOV) |
• AVI (others) (*.avi) • QuickTime Movie (others) (*.mov) • RealNetworks 'Real Video' (*.rv) • Windows Media Video (*.wmv) • All other video formats not listed |
Spreadsheet / database | • Character delimited text (ASCII or Unicode preferred): • Comma Separated Values (*.csv) • Delimited Text (*.txt) • SQL Data Definition Language |
• OOXML (ISO/IEC DIS 29500) (*.xlsx) • Excel 2007 or newer (*.xlsx) • OpenOffice (*.sxc/*.ods) • DBF (*.dbf) |
• Excel 2003 or older (*.xls) • All other spreadsheet/ database formats not listed |
Virtual reality | • X3D (*.x3d) | • VRML (*.wrl, *.vrml) • U3D (Universal 3D file format) |
• All other virtual reality formats not listed |
Computer programs | • Uncompiled computer program source code (*.c, *.cpp, *.java, *.js, *.jsp, *.php, *.pl, *.py, etc.) | • Compiled / Executable files (EXE, *.class, COM, DLL, BIN, DRV, OVL, SYS, PIF) | |
Presentation | • OpenOffice (*.sxi/*.odp) • OOXML (ISO/IEC DIS 29500) (*.pptx) • PowerPoint 2007 or newer (*.pptx) |
• PowerPoint 2003 or older (*.ppt) • All other presentation formats not listed |