AdventureWorks - Global Sales Report

Summary

AdventureWorks is a fictional company created by Microsoft. They provide a sample database allowing users to test and demonstrate the functionality of many Microsoft products like SQL server and Power BI. For this project, I'm going to connect to the AdventureWorks database and create a Power BI report that gives an overview of the company's global sales. A view of the final report is available below. I will go through the process of creating the report in some detail further down the page.

Data Connection and Exploration

Microsoft provides the sample data as a .bak (backup) file. After using SQL Server Management Studio (SSMS) to restore the database, we can start to check out what the data looks like.

 

As expected from a sample database, the design looks great at a glace. Tables have clear names and are grouped into intuitive categories like "human resources", "production" and "sales". There are 72 base tables and 20 views in total.

 

There is a lot of information here and many different reports could be made with this data. I'm choosing to focus on sales in this project, so most of the data comes from the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables. It looks like SalesOrderHeader aggregates the SalesOrderDetail table on the SaleOrderID, as well as adding some additional information.

Now that we have an idea of what we will need from the database, we can connect Power BI to the data source. Here, we connect using a SQL server source.

 

One tricky part about this data is that sale prices are given in the local currency. If we want to compare values across countries, we will need to convert the prices to a common currency. Luckily, there is a table in the database called "CurrencyRate". This table gives the daily conversion rates between USD and all of the other currencies used in this dataset.

 

To convert the prices, I'll join the CurrencyRate table to the SalesOrderHeader table so that we can directly access the values we need in the next step. The join is done on two columns: CurrencyRateID and CurrencyRateDate. CurrencyRateID indicated which currency is being converted into USD. The exchange rate varies each day, so I also join the CurrencyRateDate on the OrderDate from the sales table. Note that there are many ways to do this. The tables could be joined via SQL commands, or the same conversion could be done without joining the tables at all.

We now have additional columns in our sales table. These columns indicate which conversion is being done, and the average exchange rate during the day of the sale. Nulls indicate that no conversion is needed (the sale is already in USD).

 

To convert the sale amount to USD, we simply divide the subtotal by the exchange rate. If the exchange rate is null, we keep the subtotal as is. This is done via the formula below.

Data Modelling

At this point we will bring the data from power query into the Power BI report view. An addtional table that would be useful is a date table. The date table holds information about each date, such as which month it belongs to, which week of the year it falls in, and anything else we wish to add. See https://learn.microsoft.com/en-us/power-bi/guidance/model-date-tables for more information on date tables. For this report we will create a date table using DAX code. Since we are focusing on the "order date" to determine when sales took place, the date table will be generated based on the values in this column.

With the date table added we now have all the tables we need. However, before we start creating measures and visuals, we need to think about how the tables relate to one another. We have been setting up our tables for use in a star schema. A star schema is a type of data model where each table is considered to be either a "fact" table or a "dimension" table. Fact tables store facts about specific events, such as when a sale happened and how much was was purchased. Dimension tables contain information that describes things, such as product information or locations. Fact tables contain dimension table keys so that the descriptive information in the dimension tables can be related to each event in the fact table. Star schemas are typically best practice for Power BI models, read more at https://learn.microsoft.com/en-us/power-bi/guidance/star-schema.

 

In our case, the sales header table is the fact table, giving us information about when sales took place. The product, date, and territory tables are dimensions, describing what, where, and when a sale took place. For clarity, I give dimension tables the prefix "D" and the fact table the prefix "F". See the model view below. Note that in this report I consider the "SalesOrderDetail" to be a dimension table, as it describes the sale that occurs in the sale header table. You could model this in different ways, or argue that the SalesOrderDetail table is actually the fact table. In a real business scenario, if the data was large enough to be a concern, you could test several different modeling approaches for impact on performance.

Measures

We are now ready to start creating visuals. Many visuals can be created by simply dragging and dropping the required fields onto a chart, I won't spend time describing these. However, there are two visuals in the report which require some DAX measures. These are the daily popular products table and the pareto chart in the bottom right corner.

 

The daily popular products table shows the most popular products in each country on the date selected by the user. This requires aggregating the SalesOrderDetail table to get the total number of each product purchased on the selected date. See below the the DAX measure used to create this visual.

Pareto charts are something I have been asked for frequently as an analyst. Surprisingly, Power BI does not have a way to create these charts directly. They must be created using a line and column chart, with a measure to calculate the cumulative percentage line. See the DAX measure below.

Conclusion

With these measures made, we are pretty well done! All that's left is to add these measures into visuals and start working on color schemes and layout. Note that KPI targets were selected somewhat arbitrarily in order to show the color change.

 

So, what did we learn about the sales data of AdventureWorks? This project was meant to display some features of Power BI at a high level, but there are a few clear takeways that we can see right away.

  1. In my opinion, the most valuable piece of information is the pareto chart. AdventureWorks carries 504 unique products. The pareto chart shows us that the top 20 products generate well over 40% of the company's total sales revenue. This means that 4% of their products generate nearly half of their total revenue! Particularly, the "Mountain - 200" series of bikes is a huge revenue driver.

  2. Although total sales have been trending upward over the last 3 years, it looks like there is trouble brewing. The last 5 months of sale history show huge variability. March 2014 was the best month on record with nearly $7M in sales. However, the months surrounding it are very weak. June 2014 had just $49,000 in sales. This level of variability would certainly warrant a closer look, and would be cause for concern if it can not be explained.
  3. Over the life of the company the USA made up the majority of their business, about 60% of total sales. However, by changing the selected date range, it seems likely that AdventureWorks began in America and expanded their sales to other countries over the years, as earlier in their history the US made up an even larger portion of their sales. For example, below shows only dates in 2011 selected.