What is metadata?

Metadata is literally “data about data”. μετα is a Greek preposition and prefix meaning “with”, “next to”, “after”, or “beside”, so meta + data is data with data. You see metadata all the time and probably don’t realise it. One example of metadata are the labels that you see next to artworks in museums. 

These labels describe a piece of art briefly by listing the artist, the title of a work, the place of origin, the date, and other descriptive information that gives you context about the work. This information is often called tombstone information. Museum labels sometimes have longer descriptions of the work too. You can think about creating metadata for your collection items like creating tombstone information and descriptions for them.

The difference between print metadata and metadata for a digital collection, however, is that a computer doesn’t innately understand that “Irving R. Wiles”, for example, is the name of the artist who created the work or that “oil on canvas” is the medium. This is where metadata schema come in. Each metadata scheme consists of a list of fields, which are the categories that pieces of metadata about a work fall into. For example, “artist” would be a field and “Irving R. Wiles” would be the value that is filled in for that field.

Metadata schema also usually contain rules for how the values in each field are structured and arranged, like whether they’re uppercase or lowercase, can include punctuation, should be abbreviated, and more. Metadata fields also have to be specific enough to at least distinguish all of the items in a collection from one another. For example, only having the fields “colour” and “medium” for a collection of red ceramic vases is pointless, since the metadata would look exactly the same for every single item.

Wait, what's the difference between metadata and data?

On a basic level, if your data can be used to describe a specific collection or piece of data, then it is metadata. If your data is the top-level thing you're describing, then it's data.

But metadata is also data in its own right. And you can have metadata about metadata. Data can also become metadata in another context, and vice versa. So the real distinction is in what terminology you're using to describe what you have in front of you. For example, take the data and metadata below:

 

We have the map, which is the data or image that the General Info section is describing; and the information section with metadata about the file. But if I take a piece of that metadata, like a collection of all the types of image file formats with definitions and examples, the metadata "PNG image" could become data in its own right. And if I use this map as one of my examples, the map could become metadata.

identifier title full_title description date creator example
obj1 PNG Portable Network Graphics a type of lossless raster image file 1994 Thomas Boutell
obj2 TIFF Tagged Image File Format a computer file used to store raster graphics and image information ca. 1980 The Aldus Corporation  

How is metadata created?

It may seem sometimes like metadata just appears or that the metadata that is chosen to describe a piece of data is just common sense. In the museum label above, for example, you might think that writing the artist's name, "Irving R. Wiles", is a fairly straightforward decision. However, metadata is never created in a vacuum. Cultural biases, societal power structures and hierarchies, financial influence, politics, and many other factors play into how things are described using metadata and why they are described using specific metadata.

A prominent example of metadata issues and bias can be found in the Library of Congress Subject Headings, which is the predominate, authoritative metadata scheme for cataloging librarians who assign metadata to books. One such publicised issue arose when the LoC announced in 2016 that it was going to be revising the subject heading "Illegal aliens", given that activists, librarians (including the ALA), and lobbyers decried the term as dehumanizing. In response, lawmakers in support of stringent immigration law attempted to introduce policy that would prevent the change or require the LoC to retain the "Illegal alien heading". In 2021, the heading was finally changed to "Noncitizens" and "Unauthorized immigration", a middle ground between some politicians and the wishes of the general public that satisfied no one. You can read more about the issue on Wikipedia.

Another example of potential bias in metadata creation is the use of artificial intelligence to assign metadata to images, texts, files, and other works. While AI is able to recognise, with training, consistent patterns across images that can make metadata creation faster, it's easy to forget that AI models are created by and trained by human beings, and thus suffer from all the same biases we do. AI, for example, may not be able to accurately identify race, gender, or religious characteristics in historical photographs unless given a very specific set of training materials and query parameters to do so. Even then, AI is notoriously terrible at counting amorphous shapes in large groups, like groups of people huddled close together and hands. AI can be a really helpful and powerful tool, but it's important to remember what's behind the curtain.

Metadata standards

A metadata standard or metadata scheme is a set of rules for how a defined set of metadata fields should be applied and used to describe data. These schemes include how the metadata should be formatted, the definition of each field and what kinds of values go in it, and sometimes which disciplines or types of data the metadata scheme best fits.

A great example of a metadata scheme that you may have seen before is in the library catalogue. The Cornell University Library catalogue uses the MARC 21 Format for Bibliographic Data, a metadata scheme for describing data about library items like books, CDs, e-resources, and more. Go to https://catalog.library.cornell.edu and search for an item in the search bar. Open the record for the book and you'll see a list of fields and values that describe the book, like author, format, language, edition, and more. These are the public-facing metadata fields, and the same fields are used for every item in the catalogue. If you scroll all the way to the bottom of the record, there will be Librarian View buttonIf you click on the button, you'll pull up a page that shows all of the metadata fields and values for that particular book in the MARC 21 Format for Bibliographic Data. These are the fields that the computer understands and that are added by librarians on the back end of the catalogue. See the public facing and librarian view metadata for The cat: its behavior, nutrition, & health below:

Screenshot of the public-facing metadata from the library catalogue for "The cat: its behavior, nutrition, & health"Screenshot of the Librarian View metadata from the library catalogue for "The cat: its behavior, nutrition, & health"

How do I make my own metadata?

In order for metadata to be useful for describing a thing or collection of things, it should have the following characteristics:

  • Purpose-driven: Metadata has to be created with a specific purpose or collection in mind, so that it can fit directly with the data it describes.
  • Specific: Metadata must be as specific as needed to adequately distinguish items, which may be similar, from one another by metadata alone. This means that in most cases, you shouldn't need a photo necessarily to distinguish between two different objects. For most collections, the metadata for two different "things" or pieces of data should not be exactly the same.
  • Contextual: Metadata has to be created within the context of the "things" or data it is describing and the audience that will be reading the metadata. For example, metadata for a museum collection should use terms that museum professionals, art historians, and the public can all understand. Museum labels should also specify how the item was acquired, where it is, and provide other contextual information that allows you to truly understand the object on the whole (note that museums don't always do this in practice!).
  • Respectful and responsible: Metadata should be created with the rights, preferences, opinions, and lived experiences of the people and cultures it may describe in mind. For example, museum labels that describe a piece of art created by a Winnebago artist as "Indian", without consulting that artist, may be stripping the artists of their identity (along with problematically lumping all tribes of North American Indigenous Peoples together).
  • Standard: Metadata should be created with standardised fields that are applied to all pieces of data within a group, collection, or discipline uniformly.
  • Defined: Metadata fields can (and probably should) have some sort of accessible key to identify what each field means and how it is used. For example, the fields "topic" and "subject" are similar, so it would be important to describe what makes them different to someone accessing your metadata. As you're making decisions about what fields to use and how to use them, write these thought processes down for later.

For an example of why these attributes are important, try the first metadata exercise below:

Metadata Exercise by Kiran Mohammadi-Williams

As the examples in the Blue Fish Plate exercises demonstrate, trying to compare two objects with completely different metadata schema or with incomplete sets of metadata is pretty much impossible. Are "date" and "period" just different terms for the same field? What is a "related term"? When was the period of the Yuan dynasty? How do you depict time? When you create your own metadata scheme, make sure it is purpose-driven, specific, contextual, respectful and responsible, standard, and defined so that you are describing your data in the best possible way.

Making your own metadata

1. Gather all of your data

In the How is metadata created section?, we discussed the important characteristics of metadata, including that it should be specific and contextual. In order to be specific and contextual, you need to know what all the things you're going to be describing are, or at least what type of "thing" they are. Gather all of your things in one place so you can look at all of them and compare/contrast them at once. If your things are images, PDF files, and videos, place them all in the same folder on your computer. If your things are a group of print books, place them all in the same room in front of you.

2. Define your fields

Take a good look at all of the things in your collection. What characteristics do they have? What characteristics would you use to distinguish them from one another? Come up with as many fields (color, style, type of data, title, location, etc.) that you need to make sure no two things can be mistaken for one another. Write all of these fields down.

3. Define your rules and standards

Think about what each of your fields mean. If you have a color field and a description field, should you put the color in the description as well? If you have a location, how will the value in this field be formatted (e.g. "New York, U.S.A." or "NY, United States of America" or "-76.486794, 42.843751")? Is there a maximum to how long the title or name of a thing should be? Write all of these rules down too.

4. Check your fields and rules with the appropriate communities 

If your data is about/for a specific individual, culture, or group of people, verify with that community or individual that you are describing them as they would like to be described. You can find best practice manuals for description published by lots of different communities on the Internet, like the Respectful Terminology Platform Project. If you're creating metadata about an individual, you can use resources like ORCID, the Virtual International Authority File, and the Library of Congress Name Authority File, with the caveat that the Library of Congress sometimes gets it wrong. The best thing to do would be to include which name authority or other authoritative source you used in your metadata key so that users are aware of which authority file they should lobby when things aren't right.

5. Fill out your values

Go object by object, field by field and assign values to your things. You can see how to create a metadata sheet for your data in the Creating your own dataset section of this guide.

6. Check it once, check it twice

Check your metadata over to make sure everything looks good and that all of your fields are uniform and have adequate context. You can have a friend try to describe something using your metadata scheme to test whether it's understandable for someone other than the creator (you).

7. Make changes

It seems contrary to the "standard" characteristic of metadata that it should be changeable, but metadata is constantly changing and evolving as the data changes and as we learn more about people, progress as a society morally, take action to redress past wrongs, and gain more knowledge. Be open to making metadata changes when needed, and be sure to seek help from librarians if you have trouble finding or making a scheme that best fits your data.