What is a Pareto chart?
A Pareto chart is a type of data visualization that combines a bar chart and line chart together. In a Pareto chart the bars, as usual, represent a value that is associated with a given item. The line, on the other hand, shows how much that item contributes to the total value as a cumulative sum. In the past, I've been tasked with creating these charts for various data sets, including equipment breakdown durations, distances traveled, and sales numbers across different product categories. Let's explore a simple example to better understand how Pareto charts work.
Assume we have the simple dataset in the table below. In our example, there are 55 total sales. To calculate the contribution of each product category, we divide its sales by the total sales. For instance, backpacks contributed 18.18% (10/55) to the overall sales, while gloves accounted for 16.36%. The Pareto chart's line displays these cumulative contributions. Starting with backpacks at 18.18%, we add gloves' 16.36% to get 34.55%, indicating that backpacks and gloves together made up 34.55% of total sales (note the 0.01% discrepancy due to rounding). This process continues for each product category. By adding socks' 14.55% contribution to the previous total, we see that backpacks, gloves, and socks collectively accounted for 49.09% of total sales. This process continues for each product listed.
If we plot the product, number of sales, and cumulative percentage of total sales on a chart, we get the visualiztion below.
So, why would you want to use a Pareto chart? The answer often lies in the "Pareto Principle," also known as the "80/20 rule." This principle suggests that, in many cases, a significant portion of an outcome can be attributed to a relatively small number of causes. When leveraging Pareto charts, one of the primary goals is to identify the categories or items that contribute the most to a particular metric or value. Even in our simple, fictional data set above, we can observe that half of the product categories account for nearly 75% of the total sales. By visually highlighting these critical contributors, Pareto charts empower businesses and organizations to focus their efforts on the areas that yield the greatest impact.
Pareto Charts in Power BI
Given the popularity of Pareto charts, its surpisring that Power BI has no built-in visual create them. To implement a Pareto chart requires caluculating a cumulative measure which, in my opinion, isn't very intuitive using DAX. If you Google "Power BI Pareto" or something similar, you will likely come across a version of the DAX measure below. The idea behind this measure is to find the total sales for each product, then filter out any products with fewer sales than the product currently being evaluated. Once we have this filtered table, we can sum all the sales values to get the total sales made up by products with at least as many sales as the one being checked. This value is then divided by the total sales of all products to get the cumulative line value displayed in the chart.
The "total sales" measure referenced is simply the sum of the of the sales.
Sum Simple Products = SUM('Simple Table'[Total Sales])
The measure works well in many cases, but things can get messier when working with more realistic data scenarios. One limitation of this approach is that since it filters products based on their total sales, it runs into issues when multiple products have the same sales value. When multiple products share the same sales figure, the measure will treat them as a single entity, potentially distorting the cumulative line in the Pareto chart.
To illustrate this issue, see the example below. More data has been added and now we have several products which have the same number of total sales, with 266 sales across all products. When a Pareto line is calculated using the measure above, the line is flat across the products which have the same value. Since all products with 50 sales are kept when filtering products based on total sales, the sum of their sales (150) is used for each of them. This leads to a value of 56.39% (150/266) for each product, rather than the cumulative 18.80% (50/266) at each step that is expected. We see the same issue again for the two products with 30 sales each.
A solution I've come up with this is to use another column in the dataset as a "tiebreaker". This value will allow us to rewrite the DAX expresion so that each product is treated individually, even if it has the same number of sales as another item. In this case, we only have one other column to use (product name), so the choice is easy.
To start, I add a calculated column to my table called "Product Rank". This column simply ranks each product name from 1 to 10 in descending alphabetical order.
Then the DAX expression for calucating the Pareto line values can be modified to use this new value as part of the filtering. I this example I divide the ranking by 1000, then add this value to the total sales. Since each product has a different rank, they will all have a different value in this tie breaker column. The rank is divided by 1000 so that the number added to the total sales is small enough that it does not have an impact on the "coarse" rankings, only the rankings of products with the same number of total sales.
Implementing this new measure yields the visual below. All products now give their expected values. Items with the same number of total sales correctly show different values on the cumulative line.