What and why is data visualization?
Data visualization is the representation of data and information graphically, through pictures, videos, or some format that can be perceived by the eye. Visualizations can range from charts and graphs to maps to games and more.
Deciding whether your project data should be and could be visualised involves asking yourself a few questions:
- How would visualising this data differ from presenting it as-is?
- How would the visualization transform the meaning or accessibility of the data in some way?
- Is the effort/labour of visualizing equivalent to the benefits of visualizing?
- What kind of visualization best fits my dataset?
- What are my current technical skills? How much can I or am I willing to expand my technical skills to be able to work with a specific program or tool?
- What's the scope and timeframe of my project?
- Who is my audience?
For example, qualitative data can be more difficult to map using typical charts and graphs like bar charts, histograms, or plots because there aren't any numbers than can be easily and objectively assigned to values. Similarly, it can be difficult to use a network diagram to visualise quantitative data because there may not be direct 1:1 relations between different numbers and statistics, numbers may be repeated, and it can be difficult to understand what the numbers mean in a network context.
Above all else, we at the Digital CoLab believe that data visualization should be intentional. The visualization should contribute to the analysis, interpretation, or sharing of the data in some way that is not already apparent without it being visualised. Visualizations should do something or say something more.
How do I get started?
It can often be daunting to take your data from spreadsheet or group of loose Post-Its and turn it into something visual, especially with the gigantic pool of tools that exist to help you do that today.
1. Gathering and cleaning your data
The first step, however, isn't to figure out what you're going to use to visualise your data, but to make sure you have your dataset together. Cleaning and organising your data is necessary for you start visualizing, since many programs and tools require your data to be in a particular format to read it properly. Before you continue below, see the section on cleaning your data.
2. Finding best-fit visuals and functionality
After you've got a relatively clean dataset, the next step is to decide what you want your visualization to look like and what you want it to do. For example, if my dataset is a collection of red ceramic vases from North American museum collections, I could visualise those on a map to show the distribution of the collection across space. Or I could assign them tags and visualise them using a network graph to show which objects share particular features. Or I could make a chart showing the amount of red ceramic vases in collections over a period of years. There are many aspects of my dataset that I can visualize, so I have to determine which one (or if all of them!) would let people understand my data best.
A chart might be redundant, since I could simply say in text "the amount of red ceramic vases in museum collections increased drastically over X number of years." A map, however, may be more useful, since it can be difficult to imagine three-dimensional space and geography through text and the map also demonstrates which areas have the most ceramic vases, saying two things in one. These are just some ideas; it's up to you to decide what is most important to you to visualize and how you think visualizing a particular aspect of your data will improve understanding.
3. Determine your values
When you've decided what you want your visualization to look like and do, you'll have to find a tool that can help you do that. Before you select a tool, however, it's important to determine what your values are for visualizing your data. Software and tools are often owned by corporations, some of which have paywalls or may espouse values that you disagree with. At the CoLab, for example, our values include prioritizing open-access, minimal computing approaches whenever possible, which means the first tools we reach for will fit those values. For someone creating a public-facing collection of Indigenous data, they may want to use a tool created and managed by Indigenous people to make sure that their research is giving back to the communities it benefits from. Do your research and make sure you're choosing a tool that fits your project and your person.
4. Selecting a tool
Selecting a tool can be difficult, since you need to find something that meets your visual needs, your functional needs, your values, and your resources (budget, technical skills, capacity for maintenance, project team size, etc.). We can't tell you exactly which tool is "right" or "wrong" for your project, but see the Selecting a tool section for some general guidance.
5. Trial and error
Designing a visualization can be difficult, but it's important to remember that trial and error, getting feedback from others, and making changes is all part of the process. Take time to test out and explore different visualization types and methods, and don't be afraid to use a new tool or ask for help when you need it.
6. Publishing your visualisation
The last step will be deciding if and how you will make your visualization public. If your data is being put into a collection tool or website, your tool will likely have embedded structures for publishing to the web. If you're including your visualization in a paper, look up options for downloading the visualization as a high-quality image file or, if your visualization is interactive, publishing your paper in a born-digital format.
Because of the specificity of all of the different factors in designing, creating, and publishing a visualization, your best resource will always be your own research and trusted experts, like librarians, who can give you advice tailored to your specific circumstance. Digital Scholarship Services is happy to help direct you on the best way to visualize your humanities or social sciences collections dataset!
Selecting a tool
There are so many visualization tools available that can be used to create your project. You can find the Digital CoLab's curated list of data visualization tools, along with guides for some of them, on the Digital Scholarship Services Guides site.
When selecting a tool, there are a few things you may want to consider:
- Does this tool do what I want it to do? If it's open-source, can I make it do what I want it to do?
- Do I have enough technical knowledge, or can I reasonably gain the technical knowledge within my timeframe, to use this tool? Do I have resources to help me figure out how to use this tool?
- Do the values of the company that created and manage this tool align with my own? For example, is the tool free, open-source, open-access, web-based, etc.? Does the company responsibly use the data it collects from users?
- Does the tool host my project itself or will I have to host the project on my own?
- Does this tool have good documentation?
After you've determined what you're looking for, it can be helpful to just browse tools and shop around. See what's out there, what you like and what you don't. Make note of features you like in one tool that might be able to be integrated into another. Ask a friend. Come to the CoLab and get our opinion.
What do I do if I can't find the right tool?
If you feel like you've searched the entire World Wide Web and you can't find a single tool that works for you, contact the Digital CoLab at digitalcolab@cornell.edu. It's our job to help you find a tool that can work for you or to help you find ways to achieve the data visualization that fits well with your project.
Tools for visualizing and publishing data
- Collections & platforms - Digital Scholarship GuidesWelcome to the collection platforms & tools page! Discover your new favourite way to make an engaging digital collection, database, visualization, map, and other public-facing piece of digital scholarship through the list of tools available below. This list is by no means exhaustive, and prioritizes open-source, free tools when available.