To Make Better Visualizations

I am going to share some lessons I learned to make visualizations better, at least not making them confusing.

Visualizations are powerful tools to make users look into your data. They help to find trends, anomalies, set goals, assess performance and what not. Visualizations posses all these virtues only when they are designed well. 

Bar Charts Vs Pie Charts

Human eyes and brains are evolved to better assess linear distances than angular. Hence, we can understand a bar chart better than a pie chart plotted on the same data. Look at the two visualizations below, which convey the same information:

It becomes difficult for us to compare and infer from two sectors on pie chart even if they are adjacent. In above pie chart, all sectors look very similar, even if they are of different size. Look at the bar chart on the right, we can compare two bars even if the difference is very small - compare Product 4 (second bar) and Product 2 (third bar) in the bar chart. There is a difference of 3 points and we can still infer which bar is bigger. Hence, bar chart is my go to visualization for most of the use cases. 
I recommend pie chart in scenarios when values of each pie sector add up to make some meaningful business entity and there are at max 3 categories to make pie sectors. As the number of categories increase pie sectors become thin and difficult to interpret.

Line charts Vs Bar Charts

Line charts are powerful tools to display trends in some measure with respect to some dimension (mostly time). However, there are some scenarios which do not fit for a line chart. See example below:

Here, sales amount is plotted with respect to various products. It immediately shows an increasing trend which might look pleasant, but it is a deception. Here is why:
  • A line chart implies that there is a linear relationship among the items plotted on x axis and it is a continuous scale. In this case "Products" are plotted on x axis, which are categorical values of the dataset and there is no such relationship between two products. Each product is a different entity.
  • An increasing trend could be transformed to a jagged or declining trend by shuffling the x axis ticks. This will not alter the data but it will alter the inference drawn after first glance at the visualization.
Bar charts are better choice for this use case. If a line trend chart is to be used, then it is meaningful to be used with time scale at x axis. See below visualization:
 
Other categories could be added to line chart each having a line with different color / style. This looks good only when there are a less number of categories. More lines in chart make it a clutter. Refer two visualizations below:

1. Here 9 products' sales trend is plotted against time
2. Here 4 regions' revenue generation is plotted against time
The second visualization looks good as there are relatively less lines.

Enhancing Bar Charts

Bar charts are my favourite as they fit in most of the use cases, popular and easy to understand. Here are some tweaks which could be made to bar charts to make them better.

1. Sort the x axis categories on ascending or descending order of measure plotted. This makes it easy to compare which category is performing better and which is not. However, there are cases where x axis categories needs to be sorted.

2. Add an extra bar or a horizontal line to indicate average value of measure being plotted. This helps to determine which categories are performing above average, average and below average easily. Refer the visualizations below:




Log Vs Linear Scale

There are situations where we get measure / x axis values which have a very large range. When these measures are plotted, the largest values make the smaller values look dwarf in comparison. In case of bar chart, the smaller bars appear to be just hairline thin and in case of line chart, the line appear to be crawling along x axis. When we have such large range to be plotted on x axis, the chart often needs to be zoomed in on a particular section for better visibility. 

Let's look at the data of number of covid19 patients confirmed vs timeline. Data is referenced from https://www.kaggle.com/datasets/sudalairajkumar/covid19-in-india

This has minimum value of 1 and maximum of 32,036,511. Let's plot this data by date with linear scale on y axis (default): Confirmed Covid 19 Cases in India by Date.


According to this chart, there were 100.325K confirmed cases as of May 19, 2020. However the line trend on that day appears to be crawling along x axis as the largest value plotted on the chart is more than 32M. This behaviour makes it difficult to spot ups and downs in the numbers in the initial part.

There is an alternate way to plot the same data : use log scale on y axis. Let's see how it looks like:

In log scale the same value - 100.325K doesn't look dwarf and minor trend changes in smaller values are visible. For example, according to this chart, there had not been any change in confirmed cases until March 2020 and the numbers spiked since then. This inference couldn't be obtained from a linear scale chart.

Note: The Covid 19 dataset is used here to demonstrate the log vs linear scale in visualizations only. This information is not verified by me to be correct. It is not recommended to use these visualizations as source of truth. Please verify the correctness of data / visualizations before using them for any other purpose.

Log scale has to be used with caution. Users should be explicitly informed of the log scale. We are used to see the graphs in linear scale. A confusion for users in scale can make a huge difference in their perception of data.

I will add more points to this in my upcoming posts. I hope you liked this post. Do let me know in comments.

Comments