June 13, 2011
Contributed by Will Gorman
This article describes how to prime specific Mondrian queries in the Pentaho BI Server, as well as demonstrating how to use Mondrian's Cache Control API within the Pentaho BI Server.
Priming Mondrian Cache
Let's say you update your data warehouse on a nightly basis, and know some of the MDX queries that folks use during the day for dashboards, reporting and analysis. You can pre-populate Mondrian's cache before standard usage of the system to get a leg up on performance. Here's how you do it.
Step 1: Create an Action Sequence
Start off by creating a basic action sequence. You could do this in Design Studio or via an XML editor. Here an empty xaction we can build on. Place this in the solution repository:
Step 2: Define a Mondrian Connection
This is a very important step. There are many ways to define a Mondrian Connection, and there are many properties you can set. It's very important that the Reporting, Analysis, and Dashboards you use along with this action sequence share the identical Mondrian Connection. In this example, we'll use the already defined connection information in pentaho-solutions/system/olap/datasources.xml for SteelWheelsSales.
First, you need to add a resource to reference the Mondrian catalog:
This defines the location in the solution repository where your Mondrian schema lives. Note that the location url "solution:..." must match exactly what is defined in the datasources.xml file.
Second, you need to define a connection MDXLookupRule in your <actions>:
Just like the definition in the datasources.xml file, this connection defines a JNDI reference to the star schema. This action defines an output, which is the connection.
Step 3: Define an MDX Query (or two)
Now that we have a connection defined, we can start specifying MDX Queries that will prime the cache. Where do you get these MDX statements? There are a few approaches you can take to this. I recommend enabling the MDX log in tomcat/webapps/pentaho/WEB-INF/classes/log4j.xml and monitoring the output for often used MDX statements. You can also see the MDX that Analyzer generates if you are logged in as an Admin.
Define the following action below the connection, which executes a basic MDX statement:
You can add as many of these query statements in the action sequence as necessary. If you'd like to see the output of the Mondrian result set, add <query_result type="result-set"/> to the outputs section of the xaction.
Step 4: Schedule or execute your cache priming as part of your daily process
Now that you've created your cache priming action sequence, you can either schedule that on a daily basis by using a private schedule in the user console, or you could make a web service call to the server from your ETL process. Here is an example webservices call:
You could also add this action sequence to the startup of the server or as part of your user's session startup by adding it to the pentaho-solutions/system/sessionStartupActions.xml file.
Step 5: Validating and Debugging
Of course, you'll want to make sure that your Cache Priming is effective. How do you check that everything is working? The best resource are the Mondrian MDX and SQL log files. Enable the SQL log file by editing the log4j.xml configuration. Execute your cache priming action sequence. You should notice SQL statements that Mondrian executes as part of an MDX statement execution. Now execute your report or dashboard. If the MDX was exact, you should see no additional SQL in the log. Depending on how similar the MDX statements are, and which member and cells they are querying will determine how useful your MDX Priming was.
So what if it didn't work? Here are a few items that might cause caching to not function as expected:
- Connection information is not identical - As mentioned earlier, you'll want to make sure Mondrian is sharing connections between the cache priming action sequence and your reports and dashboards. The two key properties defined in datasources.xml and in your action sequence are the path to the Mondrian schema and the JNDI connection. If these aren't identical, then you'll have a problem. There are also a number of Mondrian system and connection properties that can influence shared connections.
- Caching is not enabled - It's possible to disable Mondrian's caching, so double check your pentaho-solutions/system/mondrian/mondrian.properties file to make sure you or a system admin hasn't disabled caching. Sometimes folks will disable caching if many changes are happening to your data warehouse. We'll get to that topic next.
- MDX Statements don't share Member or Cell cache - You may think two MDX statements that are similar might share a lot in cache, but sometimes that isn't always true. Mondrian highly tunes the SQL it generates for the specific MDX being executed. Depending on the datasource, it may be faster to go to the database than doing a lot of calculations in memory. You can manage Native SQL Evaluation with various Mondrian properties as well.
- Mondrian Catalog Cache Lazy Initialization - The Mondrian Catalog Cache, which is part of the BI Server, is lazily initialized when Analyzer first starts up or when it is re-initialized by going to Tools -> Refresh -> Mondrian Schema Cache. This can cause a false alarm that caching isn't working, because in the process of initializing this cache, it also wipes out the mondrian schema cache. If you run into this problem, consider adding admin/clear_mondrian_schema_cache.xaction as a startup action when the server first boots.
Mondrian Cache Control
Let's say you update your star schema every night with the latest information coming in from the business. You'll want to make sure that Mondrian knows to go back to the database to get this new information. Otherwise, your business users might not get immediate access to the latest information.
Option 1: The Hammer
The easiest way to manage Mondrian's cache is to wipe it out completely. Pentaho ships with an admin action sequence that does just that. The action sequence pentaho-solutions/admin/clear_mondrian_schema_cache.xaction makes a single call into the BI Server, telling it to wipe out all the cache that exists in Mondrian.
Option 2: Fine grained cache control
With Mondrian's CacheControl API, you have much more control over what portions of the cache you decide to wipe out. Taking our cache priming action sequence example above, we'll make some calls into the cache control API to demonstrate how to clear the cache. The same rules apply for getting a connection, also the same reverse rules apply for determining if the cache control was successful.
This step allows you to access the mondrian connection already defined earlier in the action sequence, as well as being able to print any message you'd like when the rule is finished executing.
Step 3: Get access to the necessary CacheControl API objects
This is the wiring necessary to get to the Mondrian objects that we need. Put this at the beginning of the cacheControlImpl() method:
The full CacheControl API Javadoc and developer documentation is available at mondrian.pentaho.com.
Step 4: Get access to the Members in Cache that you'd like to remove
Using Mondrian's API, you can easily resolve Member objects:
Step 5: Deleting a MemberSet from the Member Cache
Now that we have access to everything we need, we can start clearing portions of the cache. If it is necessary to update Member Cache, at the moment you need to set mondrian.rolap.EnableRolapCubeMemberCache to false in your pentaho-solutions/system/mondrian/mondrian.properties file. At this time, this feature is considered beta (See MONDRIAN-428).
Here is the example code to delete a MemberSet:
Step 6: Flushing portions of the Cell cache
The cell cache contains the calculated sums or aggregations of a crossjoin of data. Here is an example of flushing a region of the cache:
The attached samples include both cache priming and cache control action sequences, along with sample inserts and deletes to the steelwheels data warehouse so you can try it yourself!