Purpose of this guide
This guide is intended to get researchers started with using collections (of images, texts, audio, etc.) as data for computational analysis, visualising data to enhance research processes and products, and using computation to present and analyse collections data and metadata. This guide is not intended to be a comprehensive document on how to do everything, but hopes to introduce researchers to digital collections-based methods. If you're looking for a guide on data visualisation, you're in the right place! This guide serves as a natural companion to the Text as Data: Finding and Mining guide.
Since no guide exists without context, please read the Digital CoLab's principles to get a sense of what framework and value system this LibGuide and its recommendations have been produced under. Please contact the Digital CoLab at Olin Library if you have questions regarding collections as data, data visualization, or digital collections and presentations.
Introduction to collections as data
Collections as data is the idea and practice of using collections (a group of objects, items, texts, etc., typically digital or digitized) as data that can be analysed, represented, etc. by people using computers. The term came into common use in the digital humanities field with the collaborative project Always Already Computational: Collection As Data, directed by Thomas Padilla, that “documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven research and teaching.” This project was followed by a report on the responsible implementation of collections as data practices, Collections as Data: Part to Whole.
Collections are groups of objects, items, texts, things, etc. As in this term really means “is”; the collections (and their metadata) are serving as data, becoming data, being considered as data. Data are groups of ordered information stored digitally, that are capable of being processed by a computer. So, in sum, collections as data means groups of objects being formatted as groups of ordered information, that are then analysed by a computer.
Collection as data work thus explores the potential of using computational methods to analyse digital collections, digital objects, and their metadata, using digitised and born-digital collections and their metadata as datasets to perform computational analysis.
Collections as data work can include:
- Digital collections and exhibits
- Interactive maps and visualizations
- Digital databases
- Scholarly websites and web archives
- Processing, presenting, and interpreting metadata from collections
- Much, much, more!
Resources on collections as data
On a Collections as Data Imperative, (Thomas Padilla), 2017.
"The" source on collections as data. Padilla is a pioneer of the field/movement that would become collections as data, and this article outlines his philosophies around what collections as data can be, when given the space and resources to thrive and be accessible at institutions.
Always Already Computational: Collections as Data: Final Report, (Thomas Padilla et al.), 2019
"From 2016‑2018 Always Already Computational: Collections as Data documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally‑driven research and teaching. With funding from the Institute of Museum and Library Services, Always Already Computational held two national forums, organized multiple workshops, shared project outcomes in disciplinary and professional conferences, and generated nearly a dozen deliverables meant to guide institutions as they consider development of collections as data."
“What Do We Mean by ‘Collections As Data’ (CAD)?" (Cory Lampert & Emily Lapworth), 2020.
"'Collections as data' is the idea that collections (such as UNLV’s Digital Collections) can be used as data for researchers to analyze using computers. For example, a historian can use a computer program to quickly read thousands of pages of text and identify patterns such as topics or people that are named. Digital or computational research methods such as text mining, data visualization, mapping, image analysis, audio analysis, and network analysis automate steps in the research process that would take humans many hours to do, or are even impossible for humans to do manually. If you’ve heard of Digital Humanities, Collections as Data falls under that umbrella of scholarly activity."