Skip to Main Content

ARKEO 4755 Un-essay Guide: Data & Metadata

This is a guide for students in Kurt Jordan's Indigenous Erasure and Resurfacing course doing a digital humanities project for their final.

What is data?

digital object is basically a "thing" that exists in digital space, like on the Internet or as a file. Data makes up that thing and determines how it is presented, served, viewed, consumed, downloaded, manipulated, edited, and so on. Some common digital objects are:

  • Images
  • Videos (like films)
  • Audio files (like oral histories)
  • Texts (like poems)
  • Games
  • Combinations of the above (multimedia)

When we talk about a dataset, we're referring to the group or collection of data that can be read by a computer or machine at one time. Often datasets must be formatted in a particular way so that each piece of data can be read the same way by the computer and distinguished.

What is metadata?

Metadata is literally “data about data”. μετα is a Greek preposition and prefix meaning “with”, “next to”, “after”, or “beside”, so meta + data is data with data. You see metadata all the time and probably don’t realise it. One example of metadata are the labels that you see next to artworks in museums. 

These labels describe a piece of art briefly by listing the artist, the title of a work, the place of origin, the date, and other descriptive information that gives you context about the work. This information is often called tombstone information. Museum labels sometimes have longer descriptions of the work too. You can think about creating metadata for your collection items like creating tombstone information and descriptions for them.

The difference between print metadata and metadata for a digital collection, however, is that a computer doesn’t innately understand that “Irving R. Wiles”, for example, is the name of the artist who created the work or that “oil on canvas” is the medium. This is where metadata schema come in. Each metadata scheme consists of a list of fields, which are the categories that pieces of metadata about a work fall into. For example, “artist” would be a field and “Irving R. Wiles” would be the value that is filled in for that field.

Metadata schema also usually contain rules for how the values in each field are structured and arranged, like whether they’re uppercase or lowercase, can include punctuation, should be abbreviated, and more. Metadata fields also have to be specific enough to at least distinguish all of the items in a collection from one another. For example, only having the fields “colour” and “medium” for a collection of red ceramic vases is pointless, since the metadata would look exactly the same for every single item.

How it metadata created?

It may seem sometimes like metadata just appears or that the metadata that is chosen to describe a piece of data is just common sense. In the museum label above, for example, you might think that writing the artist's name, "Irving R. Wiles", is a fairly straightforward decision. However, metadata is never created in a vacuum. Cultural biases, societal power structures and hierarchies, financial influence, politics, and many other factors play into how things are described using metadata and why they are described using specific metadata.

A prominent example of metadata issues and bias can be found in the Library of Congress Subject Headings, which is the predominate, authoritative metadata scheme for cataloging librarians who assign metadata to books. One such publicised issue arose when the LoC announced in 2016 that it was going to be revising the subject heading "Illegal aliens", given that activists, librarians (including the ALA), and lobbyers decried the term as dehumanizing. In response, Republican lawmakers in support of stringent immigration law attempted to introduce policy that would prevent the change or require the LoC to retain the "Illegal alien heading". In 2021, the heading was finally changed to "Noncitizens" and "Unauthorized immigration", a middle ground between conservative politicians and the wishes of the general public that satisfied no one. You can read more about the issue on Wikipedia.

Another example of potential bias in metadata creation is the use of artificial intelligence to assign metadata to images, texts, files, and other works. While AI is able to recognise, with training, consistent patterns across images that can make metadata creation faster, it's easy to forget that AI models are created by and trained by human beings, and thus suffer from all the same biases we do. AI, for example, may not be able to accurately identify race, gender, or religious characteristics in historical photographs unless given a very specific set of training materials and query parameters to do so. Even then, AI is notoriously terrible at counting amorphous shapes in large groups, like groups of people huddled close together and hands. AI can be a really helpful and powerful tool, but it's important to remember what's behind the curtain.