What is data?
There are so many definitions of data that exist, depending on who you ask, what data they work with, or what their motives are that no one definition can cover it all. For example:
“Data (/ˈdeɪtə/ DAY-tə, US also /ˈdætə/ DAT-ə) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally.” –Wikipedia
“Data is a collection of facts, numbers, words, observations or other useful information. Through data processing and data analysis, organizations transform raw data points into valuable insights that improve decision-making and drive better business outcomes.” –IBM
“Data can be pretty much anything, really. Or rather, pretty much anything can be data…. Data isn’t just numbers, though it often is reduced to that. It’s discrete packets of evidence that we can potentially aggregate to find patterns and meaning. It’s testimonies, field boundaries, the human genome, the avocado genome, a bottle of wine, a packet of crisps, the complete works of Shakespeare…” –University of York Skills Guides
We often think of data as figures on a spreadsheet, something technical and quantitative that is written in code or numbers, objective facts that can seem almost indecipherable to the average person who stumbles across it. But data can be, as the University of York defines above, “pretty much anything”. For example, the following picture of my cat is data:

And it’s not just data because it’s a JPEG image file that includes pixels, color data, and other graphical information related to the image. The photo tells you that there is, somewhere, a cat. It tells you that my cat is brown and orange and black. The filename tells you that my cat’s name is Camilla. It tells you that my cat is extremely cute. It even tells you that I am a person who takes pictures of my cat (who doesn’t?). All this to say that data is, essentially, any “thing” that can be used to derive or find meaning.
Data can come in many different media or formats. Some common ones you might have seen before include:
Datasets and digital objects
A digital object is basically a "thing" that exists in digital space, like on the Internet or as a file. Data makes up that thing and determines how it is presented, served, viewed, consumed, downloaded, manipulated, edited, and so on. Some common digital objects in collections-as-data work are:
- Images
- Videos (like films)
- Audio files (like oral histories)
- Texts (like poems)
- Games
- Combinations of the above (multimedia)
When we talk about a dataset, we're referring to the group or collection of data that can be read by a computer or machine at one time. Often datasets must be formatted in a particular way so that each piece of data can be read the same way by the computer and distinguished.
Data ethics
CARE and FAIR Principles
The CARE Principles for Indigenous Data Sovereignty are a set of guiding principles for the inception, creation, use, archiving, and control of data that affirm and ensure the rights, personhood, interests, and respect of Indigenous Peoples. The principles were created in response to the historical and continued use of data about Indigenous Peoples against them by private researchers and government entities and the marked exclusion of Indigenous Peoples from access to, use of, and benefit of data about them, even within the open data movement.
The CARE Principles are linked closely with the FAIR Guiding Principles for scientific data management and stewardship, a set of principles for finding, accessing, ensuring the interoperability of, and reusing data. The intention behind both sets of principles is to make access to and use of data less exclusive, to ensure the rights and respect of peoples creating and being described by data, and to hold data creators, brokers, users, archivists, and other stakeholders accountable for their creation and use of data. The CARE and FAIR Principles are:
You can read more about the CARE Principles from the Global Indigenous Data Alliance.
Consentful Tech
The consentful technology movement emphasises that digital applications should be built, managed, and used with the following conditions for data collection and use in mind:
- Freely-given: Consent for use of data or a part of someone's "digital body" should be freely-given, without coercion, duress or pressure, by that individual.
- Reversible: Consent for use of data or a part of someone's "digital body" should be capable of being revoked by that person.
- Informed: Both the user and creator/owner of the data or "digital body" should know and be honest about the full conditions of use/reuse for that data.
- Enthusiastic: Consent for use of data or a part of someone's "digital body" should not be begrudging or induced through social guilt.
- Specific: Consent for use of data or a part of someone's "digital body" should should be specifically applied only to the pieces of data mutually specified and agreed upon by the parties, not all of someone's data or unspecified additional data.
The intention behind the movement is to combat the rampant theft and/or unwitting use (through those gigantic, long terms and conditions documents that corporations make you sign before using a service) of personal data by big corporations and companies who profit off of that data without informing or giving a cut to the owner/creator of that data. Read more on the Consentful Tech Project site, which is a collaboration with Data for Black Lives.