October 24, 2006
Contributed by Chris Webb, Crossjoin Consulting
This technical article was generously contributed by Chris Webb, an OLAP guru and independent consultant (check him out at Crossjoin Consulting, or on his blog). Chris also has written a book on MDX called 'MDX Solutions', that you can get here once you are looking to go deep with Mondrian and MDX!
In this series I'll be showing you how to implement some common MDX calculations using the SteelWheelsSales sample cube. In many cases you will be able to take these calculations and, after changing some of the measure and dimension names, be able to re-use them in your own cubes; hopefully you'll also be able to pick up a bit of the theory on the way and be able to tackle more specific problems yourselves. Nicholas Goodman has already given excellent explanations on how to edit the MDX query used by JPivot to display data (http://www.nicholasgoodman.com/bt/blog/2006/06/08/mondrian-power-of-olapmdx-at-your-fingertips/) and how to add calculated members to a Mondrian cube (http://www.nicholasgoodman.com/bt/blog/2006/06/09/mondrian-olap-power-at-your-users-fingertips/) so I won't repeat what he has to say.
Resources Before You Get Started
To run the queries that Chris presents in this technical article, you will need JPivot, hooked up to the SteelWheelsSales sample cube. You can get both of these pre-configured and ready to use out of the box in the Pentaho Pre-configured Installation (PCI). Just navigate to the Steel Wheels Analysis Samples, and select the MDX Query button from the JPivot toolbar.
- Pentaho Pre-configured Installation (PCI) running locally, and the hypersonic database running locally with the SampleData database, version 1.2 or later.
Let's imagine that you need to see one of the commonest types of Business Intelligence calculation in your reports: previous period growth. Let's say you've got sales for several months in your cube and you want to be able to calculate, for each month, the percentage or absolute growths from the previous period. To be more specific, what you want to be able to do is to take a measure value and create a new calculated member in the cube which will display either the absolute or percentage difference between the value displayed for that measure for the current time period and the previous time period. The previous time period could be a Year if we were looking at the Year level in our Time dimension, it could be a Quarter if we were at the Quarter level or it could be a Month if we were at the Month level. What's important is that we somehow need to get a reference to a member relative to the member we're currently looking at on the Time dimension.
Exploring MDX Queries
In order to be able to do this we first need to understand some basic MDX concepts. First of all, in order to be able to uniquely identify a member on a dimension we need to know its 'unique name': for example [Time].[All Years]. or [Product].[All Products].[Classic Cars]. We can then use one or more unique names as co-ordinates to identify individual cells within the cube, and one of these cell references is called a 'tuple'. A tuple in MDX is written as a comma-delimited list of member unique names enclosed in parentheses, for example ([Measures].[Sales], [Time].[All Years]., [Product].[All Products].[Classic Cars]). Tuples must contain at least one member unique name and can contain up to however many unique names as there are dimensions in the cube; in the latter case you could be sure to have a static reference to just one cell in the cube but most of the power of tuples in MDX comes from being able to only refer to members on a few dimensions. When you do this, the members on the dimensions you don't explicitly refer to are either supplied by the context of the query or, if the dimensions aren't referred to in the query either, by the 'default member' settings on the dimension.
Don't feel guilty if you've been struggling to follow all the jargon in the last paragraph though. Let's write a few queries to illustrate these ideas and once we've done this you'll get a better idea of what all these terms mean. Let's start with the following query:
It shows some Markets on columns, all the members on the Time dimension on rows and slices the data displayed by the Measure Sales. Here's what the results should look like:
We could get the value of just one of the cells in this query using a tuple, the one highlighted in red, and display it using a calculated measure:
What we're seeing here is a tuple which returns the value of Sales for the EMEA market in the year 2003, and it returns the same value for every cell in the query. If we remove the reference to the year 2003 from the tuple though, see what happens:
We're still seeing data for Sales and EMEA, but it's now varying by the Time dimension. By not referring to a member on the Time dimension in our tuple we're in effect asking for the 'current member' on Time, which will vary each time the tuple is evaluated within the query. Similarly, if we add the reference to 2003 again but remove the reference to the Market EMEA, we can see something similar happen:
The values are now varying by Market but not by Time. Finally, if we just refer to the Measures dimension in our tuple then we have a calculated member which returns exactly the same thing as the Measure Sales:
Calculating Previous Period Growth
Going back to our original problem of how to calculate a previous period growth it's clear we've just inadvertently come up with half the expression we're looking for: we can use the tuple ([Measures].[Sales]) to return the current value for Sales within our calculated member, but we still don't know how to get that relative reference to the previous member on the Time dimension in order to create a tuple containing the value we want to subtract. In order to do this we need to use the CurrentMember function which returns – as the name suggests – the current member on any given dimension. We can see exactly what this function does by using it in conjunction with the Name function, which returns a string containing the name of a member; take a look at the following query to see an example of these two functions working:
The calculated measure here returns the name of the current member on the Time dimension; naturally enough it returns the same value for each row. Changing the dimension to the Markets dimension so:
...shows the name of the current member on Market. The next step is to use the PrevMember function with CurrentMember and Name; PrevMember returns the previous member on the same level as a member. So, for example, the following query displays the name of the previous member to the current member on the Time dimension:
Rather than use the Name function to return a string, we can now instead construct a tuple using CurrentMember and PrevMember which returns a measure value for the previous time period:
We can now calculate our absolute previous period growth very easily, as the difference between the tuples we've found which return the current value for Sales and the previous period value for Sales:
If we wanted the percentage growth instead we could alter the formula used as follows:
Note the use of the Format_String property on the calculated measure here, which is used to format the results returned. If you didn't specify it you'd find that the default format applied doesn't show any useful data!
Thanks to Chris Webb for submitting this article!