Why do we visualize data? The most basic answer is that it’s very hard to read data encoded in a database or dataset. Visualization makes use of the basics of human perception to intuitively present the data.
Even if you never made a graph, you’ve probably already created a data visualization. A mere table is a data visualization; it’s data visually placed in rows and columns, helping the viewer read the data.
We also use data visualizations to find different points-of-view that help us interpret the data. A table is one example; although the way in which it shows the data often makes it difficult to identify trends and make comparisons. You need to look at the values in each cell, store them in your short-term memory, execute an analysis, and correctly form a conclusion. Many people would need to turn to pen and paper to complete these tasks. Data visualization can offer many different points-of-view. Some of these views could be tables and others could be charts and plots. All of them have different up- and down-sides and could help the user with their needs.
Reading and understanding data is one of the most important parts of data visualizations, but there are more. For instance, visualizations must also be memorable, convincing, entertaining, sleek, or whatever else is important to you and your objectives.
DATA VISUALIZATION ORIGINS
There are various theories about the origin of the first data visualization. But because these origins exist before humanity began to document their activities, it is impossible to pin-point an exact year. The Turin Papyrus Map is an ancient Egyptian map dating back to 1150 BC. It is considered to be the oldest found map and the first documentation of data visualization. But the Tally Chart would probably win as the “oldest data visualization” as it dates back to between 33000 BC and 23000 BC. This just goes to show how naturally this craft came to humanity.
Data Visualization has experienced a few big growth spurs throughout history. One of these was between 1860 and 1890 when statisticians, governments, and municipal authorities were eager to discover the possibilities and problems of graphic representation. Visualization was widely adopted, and graphics were officially recognized by government agencies, becoming a feature of official publications.
One famous example is the Minard diagram created by Charles Joseph Minard, which was published in 1869. It is a precursor of the Sankey diagram, which depicts the course of Napoleon’s Russian campaign in 1812. It combines several aspects of the campaign into one diagram, combining the size and the direction of the army’s movements with information including the geography.
Visual variables are what a data visualization consists of. They are like the flavors of a dish, the words of a sentence, or the musical components of a song (tempo, notes, rhythm, …). In 1967, the variables were first systematized by the French cartographer Jacques Bertin. In his book Sémiologie Graphique he teases out different components: position, size, shape, value, hue, orientation, and texture.
Let’s say you have a set of these numbers: [3, 10, 14, 25, 30, 50, 87, 95, 100]. You could create visualizations with each visual variable.
Since 1967, researchers have added variables, like angle, area, slope, volume, connection, movement (when data is animated), transparency, and interactivity. To be fair to Bertin, some of these, like movement and interactivity, were not even possible when he first thought of the system.
RANKING VISUAL VARIABLES
Some researchers rank the visual variables, like the ranking below. In this plot, you will notice positions along a common axis, like a bar or line chart, are the most effective at informing the user about accurate estimates.
But don’t start using bar charts for every dataset. Remember, data visualizations can have different purposes. Maybe the visualization needs to turn heads and be remembered; something a bar chart isn’t great at.
The picture above also states that color is the least accurate. This is not always true. Try to find the blue X below. Easy right? Did you use the position along a common axis? Not in the slightest, you used color. Color may not be the best at representing exact values, which the earlier ranking was concerned about, but it’s very good at steering the eye to important details.
You need to know what your audience needs. This cannot be overlooked as there is no one-size-fits-all solution in data visualization.
PRINCIPLES FOR DESIGN
MISTAKES TO AVOID
Creating well-crafted and effective data visualizations is a real art. Due to the developments in and the gaining popularity of data science, there are now also people who only focus on this one aspect of data science. Even if you are not a visualization expert, you can easily improve the quality of your visualizations by avoiding the above nine data visualization mistakes.
These mistakes do not follow the three principles of good visualization design. Many of these mistakes are made unconsciously due to a lack of knowledge, but sometimes these “mistakes” are made on purpose to mislead the audience.
For example, if cherry-picking is used, or where the full picture is not revealed but the facts matching the opinion of the creator are used to create the impression of a change, even where there is relatively little change. These two examples both violate the principles of trustworthiness.
Not following common design conventions is another type of violation of the accessibility principle. Or too much clutter on your data visualization violates the elegancy principle. Some mistakes have a negative impact on some principles. Improper use of color, for example, can lead to a chart not being accessible and elegant.