The Process of Creating Data Visualizations

by Jan Willem Tulp 1 year ago Filed Under: Design

Jan Willem Tulp is a freelance information visualizer based in The Hague, The Netherlands. With his company, TULP interactive, he creates data visualizations for a wide variety of clients, such as Scientific American, Popular Science, Amsterdam Airport and World Economic Forum.

Have you ever wondered how today’s beautiful visualizations are created? If you Google ‘process of creating a data visualization’, you’d be hard-pressed to find a good resource that describes it. And maybe the reason is that there is no such process — not a formal one, anyway.

Despite not having a required set of steps, here are three stages that the visualization creation proces goes through:

1. “Get me some data, now!”

Usually one of the first steps in the process of creating a visualization is to get some sense of the dataset — and do that as soon as possible. Some examples of what you want to know first:

  • Is it a large dataset or a small dataset?
  • What is the format? Excel? JSON? Does it need to be scraped first?
  • What is the complexity of the dataset?
  • Does it have many outliers or not?
  • Does it have both very small and very large numbers or is it a flat dataset?
  • Is the structure of the dataset directly usable for the visualization or does it need some restructuring?
  • what is the quality of the data? Does it need some cleaning? Some translation of some values (Fahrenheit to Celsius for instance, or date formats)?

Answering these questions helps to get a sense of the size and complexity of the data, and some sense for the potential for a visualization. Especially when doing client work, this is an important step to take early on, in order to be able to make a more accurate estimate for a project.

2. Analysis and story finding

For some projects, the story is clear before you start work on the visualization. One example is a data visualization to be used as an illustration for an article that has already been written. In this case, a visualization is used to amplify the story in the article, and you needn’t do much — or any — analysis of the data in order to find an interesting story.

In other cases, a project starts just with a dataset, and usually the initial question is: “can you create something interesting with this dataset?” In that case, it will be up to you to analyze the data in search of interesting stories and insights is required.

This is usually done in the early stages of the process, since it will give some direction of how to evolve the data visualization. Analyzing data may also require some data mining or statistics in order to come up with interesting insights.

(How riot rumors spread on Twitter; a great example of story telling with visualization.)

Some tools to consider using are Tableau, R or Google Refine, or you could always write your own custom code to do some data analysis. Python has some great modules for data analysis.

3. Visualize it!

Creating a visualization means evolving a visualization. In other words: you have to discover a visual representation that works for this dataset and for the message or story you want to communicate. This means that you create many intermediate visualizations to finally end up with the final result.

A data visualization is nearly a literal translation from mostly numerical data to a visual representation (this translation is called ‘visual encoding’). And because of this almost 1-to-1 relationship between data and visual representation, you can only evaluate what works and what doesn’t work once you see the visualization in front of you. It is hard to sketch out some shape or form first, and then try to squeeze the data into that form; for data visualization, form follows data.

Seeing an intermediate visualization and being critical about it will immediately result in ideas for improvements. For instance, some circles may be too small to see, so size needs to be increased. Or a linear scale does not work because some data is all plotted on top of each other, so a logarithmic scale might be a good alternative. Or a user might miss valuable information to understand the visualization. Adding some interactivity, which allows you to show more details, can also help. Or one representation of the data does not tell the full story, which means additional representations might give more context. These decisions are partly some sense for design and aesthetics, but also very much based on a large body of theory of information visualization and how perception works.

Great visualizers, like Jer Thorp, Moritz Stefaner, Martin Wattenberg / Fernanda Viegaz, and the data journalists at the New York Times, all create these intermediate visualizations to find out what works and what doesn’t.

(Intermediate visualization from NYT for Facebook IPO chart. Source)

Ben Fry describes the process of creating a data visualization in a very accessible way. In his PhD thesis on Computational Information Design, he outlines the following process:

(Computational Design Process.Source)

All of the steps in the process can be very simple or even unnecessary, but they can also be very complex, requiring a large amount of work — or anything in between. Also note that this is a very explorative and iterative non-linear process, where each step can be revisited many times.

From visualization research, Tamara Munzner has defined another workflow for creating visualizations:A Nested Model for Visualization Design and Validation.

(A Nested Model for Visualization Design and Validation by Tamara Munzner. Source)

The fact that several different models of the process have been outlined shows that there is no one way to create visualizations. There are many different paths through the process and most of them can produce good results. The most remarkable characteristic of the process is that it is always explorative and iterative. This is also what makes it enjoyable: you see improvements as the project progresses.

Have fun creating visualizations!