Thursday, 27 September 2012

OLAP Cache Optimisation

I would like to expand on one aspect of improving reporting performance in SAP BW – exploiting the OLAP cache. The OLAP cache is standard functionality that is setup during a technical install of SAP BW. However, to realise the full benefits of speedy queries based on cached data, some additional steps need to be taken post-install. I’ve noticed these steps, and in particular the scheduled population of the OLAP cache, has been overlooked on quite a few implementations. 

In this post i’ll focus on the concept of ‘pre-filling’ the cache on a regular basis through the Business Explorer (BEx) broadcaster. I’ll also cover some points to consider in maintaining the OLAP cache and performance tuning. Let’s use the functionality available in a standard install to give your users the benefits of cached data!

Cache: A safe place for hiding or storing things.

A cache can be described as a buffer that’s employed to re-use commonly occurring items. A cache has many applications in a computing context: the memory cache, internet browsing history cache, disk caching, etc. In all of these cases, caching provides an answer or data retrieval, faster than another alternative. Faster data access can be achieved through caching data in registers (accessed in 1-5 clock cycles) vs. storing it in slower RAM (10-100 cycles).
SAP BW OLAP cache
In a SAP BW context, the OLAP cache buffers query results sets retrieved from the database (disk) by storing them in resident memory as highly compressed cluster data. Why bother storing query results in memory? The answer is simple – this is a speed contest between disk access vs. memory access, with memory being the much faster winner. On both sides of the equation, physical and electronic, there’s compelling logic that application performance will be improved through maintaining data in memory, rather than retrieving it from disk. Disk access is one of the few mechanical (as opposed to electronic) functions integral to processing and suffers from the slowness of moving parts. On the software side, disk access also involves a ‘system call’ that is relatively expensive in terms of performance. The desire to improve performance by avoiding disk access is the fundamental rationale for database management system (DBMS) caching and other file system caching methods. SAP BW is no different – and contains caching at the application layer to improve OLAP reporting performance.

Users progressively fill the OLAP cache as they access queries. When a user accesses a query result set for the first time, a database read from disk takes place and the result set is then written to cache memory. A cache read action takes place when the same result set (or subset) is subsequently accessed. Hence, the result set is retrieved from memory, instead of being retrieved from disk. An important feature of the OLAP cache is that read/write access to the cache is available to all users on the application server. All users get the benefits of faster data retrieval from the cache if another process or user has previously written the data to it.

The OLAP cache has a physical memory size and hence a storage limit. The cache persistence mode can be setup to either swap to disk or be overwritten when it’s full. OLAP cache data is invalidated each time a request is loaded into a Basic InfoCube; meaning that any Basic InfoCube that is updated daily, will have its underlying cache data wiped. This makes sense as the cached query result set data for that InfoCube, whilst accurate as a historical snapshot is now no longer representative of the current data in the InfoCube. Hence the old query result set needs to be erased. This begs the question, when should the cache be filled with the required query result sets?
Default OLAP Cache

In the default approach post BW install, only users progressively fill the OLAP cache as they access queries. Users that access query data for the first time will retrieve data from disk and not the OLAP cache – meaning they will experience a longer delay than users subsequently accessing the same data set.

The delay in retrieving query data from the database for the first user access can be significant for certain queries – a key problem that needs to be solved. This first access delay is compounded for web templates or dashboards that execute several queries for the first time in parallel. The delay is also influenced by the complexity of the query and your success with other performance optimisation techniques. Furthermore the delay will occur during user runtime – a particularly inconvenient time – as we want to shift any delays in accessing data away from the user’s waking hours. In any case, if this query result delay is significant – then you’ve lost the user already.
Accessing the OLAP Processor
OLAP Processor - Access hierarchy
When searching for requested data, a BI query accesses performance optimised sources in the order shown to the left. The important take-away is that the OLAP cache is read first, in preference to all other performance optimisation techniques, even the BI Accelerator. However, this isn’t to say that these techniques don’t have their place.

Performance optimisation techniques, such as aggregating data, compressing data, limiting table size, etc. are still vital to any successful implementation. They are still valid and necessary even with an OLAP cache. Not only will they help to soften the blow of the first disk read as previously discussed, but it is impossible and not efficient to cache all the permutations of query result sets. OLAP analysis by its very nature is predicated on not being able to predict every possible query request.

Even if you did try to make such a prediction, you’d most likely end up caching a whole bunch of result sets that:
  • User’s will never read; and
  • Would be invalidated on a regular basis as the Cubes are updated.
Pre-filling the cache
The idea of pre-filling or ‘warming up’ the OLAP cache is to defer the database operations away from user run-time to another more convenient point in time, system time. Such a time could be after the daily load in the early hours of the morning when few (if any) query requests are being processed.

The database is going to have to do the work at some point in time to fill the cache – this much is clear. It could also be argued that this work, the ‘pre-filling’ the OLAP cache, represents an additional operation and load on the system in addition to the regular data flow processes. In my opinion this is probably the wrong way of looking at the problem. If we can accurately predict what the most common query requests are going to be, then there will be no additional load on the system. These query requests would happen in the normal course of business. Additional load would only be generated if we go overboard caching every possible query request permutation possible and then that cached data is never accessed. 

To prevent this we should endeavor to restrict data being loaded to the cache to only likely requests and this will vary from Data Mart to Data Mart. Tools are available in SAP BW through the Transaction RSRCACHE to determine which query entries have been read from the cache. This should be monitored on an ongoing basis to optimise your caching strategy and remove any poorly performing cache entries from being continually generated. See Monitoring the OLAP Cache in SAP Documentation for further details. 

The most common and convenient way of pre-filling the OLAP cache is through the BEx broadcaster, although there are several other approaches, such as:
  • Using the Reporting Agent (3.x release) to fill the cache (Transaction Code REPORTING_AGENT).  Note that in 7.x a warning message appears stating that this can only be run for 3.x queries and templates, but it works fine with 7.x queries.  However, it doesn’t work with 7.x web templates and will not be developed further by SAP; or
  • Create a program/function module/web service that executes the relevant queries (See FM RRW3_GET_QUERY_VIEW_DATA );
  • Create an Analysis Process to run the relevant queries and output the data to a dummy target such as a DSO.
Each of these methods are able to be inserted into a process chain and hence, can be regularly scheduled.
BEx Broadcasting Approaches:
Some recommended broadcasting approaches that i’ve seen work well on implementations are listed below. Let me know what approaches have worked well for you – I’d be interested to hear your feedback.

1. Schedule broadcast on data change

Firstly in the BEx Broadcaster schedule the broadcaster to run on a data change event.



Add a data change event to the end of each datamart load process chain.

Process Chain - Event data change

This event triggers the broadcasts which pre-fill the OLAP cache. In the data change event you specify which InfoProvider has had it’s data changed (and hence the OLAP cache has been erased for it) and broadcasts scheduled on the event for the InfoProvider are triggered.

2. Web Templates

Web templates can contain multiple data providers and hence multiple queries.  Instead of creating individual broadcast settings for each query in the template, a single global setting can be created for the template.  When scheduled and broadcasted, this web template setting will run, and hence cache, all of the underlying queries in the template as they would appear in the dashboard.

Unfortunately for web templates, there isn’t a broadcast distribution type to fill the OLAP cache.  Instead if you’re objective is to only fill the OLAP cache, another setting such as Broadcast to the Portal will need to be used.  I’d recommend the following settings:

Distribution Type:   Broadcast to the Portal
Output Format:   Online Link to Current Data
Authorization User:   ADMIN
Export Document to My Portfolio:   Check
Export User:   ADMIN

Using these settings will result in generated online links to the web template (and not larger files) being posted for the Admin User in the Business Intelligence Role in My Portfolio/BEx Portfolio/Personal BEx Documents.

Broadcast - KM Admin Folder

The online links posted in this folder will overwrite each other and hence preventing a large amount of documents being stored in this directory over time.

3. Queries

Queries can be individually broadcasted.  Unlike with broadcasting a web template, more settings are available, such as broadcasting by multiple selection screen variants and by filter navigation on the query result set, i.e. by a characteristic.
Cache maintenance
Now that you’ve broadcast queries/web templates to the OLAP cache, you’re going to need to maintain it. To do so, a little more information is needed about the technical architecture of the SAP BW OLAP cache. Technically, the OLAP cache consists of two caches, the local cache and the global cache. These two caches can be setup with different parameters, such as size. The local cache is accessed by a single user within a session on an application server. Local cache data is retained in the roll area as long as it is required by the OLAP processor for that user. Global cache data is shared by all users across all application servers. Global cache data is retained for as long as it is required and will either be deleted when it is no longer needed, e.g. the underlying data has changed and the cache is invalidated, or depending on the persistence mode will be swapped to disk when the cache size is exceeded.

The cache size parameters indicate the maximum size that the local and global caches are permitted to grow to.  The global cache size should be larger than the local cache size, as the global cache is accessed across multiple users.  The local and global cache size values should be generally extended from their default settings when you install SAP BW. This will take advantage of memory available and ensure that the stored cache entries do not exceed the cache capacity. The size parameters should be reviewed periodically depending on cache usage, hit ratio and overflow. The cache size must be appropriate to manage the frequency of query calls and the number of users.  Some indications that your cache size should be extended are:
  • The number of cache entries has filled the capacity of the cache at least once.
  • Average number of cache entries corresponds to at least 90% of the capacity or has reached this capacity around 30% of the time.
  • Ratio of hits to gets, is lower than 90%.
You can configure the cache parameters using Transaction RSCUSTV14.  Note that the size of the global cache is determined by the minimum value of the Global Size MB parameter and the actual memory available in the shared memory buffer (profile parameter rdsb/esm/buffersize_kb). You should therefore use Transaction ST02 to check whether the size of the export/import buffer is appropriate – as the default setting of 4,096 KB is often too small.  SAP recommends the following settings:
  • rsdb/esm/buffersize_kb=200000
  • rsdb/esm/max_objects=10000
The permitted cache size needs to be realistic though. If you’re talking in the multiples of gigabytes, you may wish to review why you need so much data in the cache in the first place. A significant OLAP processing load would need to take place to generate that much cache data.
Wrap up
We’ve covered a fair bit of ground on the SAP BW OLAP cache in this post. Significant performance benefits can be achieved through the system (and not the user) pre-filling the cache on a regular basis as the underlying data changes. The benefits are available to all users as the OLAP analysis results are stored in a central repository, the OLAP Cache. Furthermore the initial hit on the database (and subsequent delay) when the query/web template is run the first time after the data has changed, is taken away from the user and performed by the system at a more convenient time. A benefit that they’ll sure be appreciative of first thing in the morning when they arrive at their desk. 

Hope you like this post and useful.

3 comments:

  1. This is really interesting, You are a very skilled blogger. I have joined your feed and look forward to seeking more of your wonderful post. Also, I have shared your site in my social networks!

    SCCM 2012 Online Training By Realtime Trainer In India

    ReplyDelete
  2. My olap cache is not updated some times when the underlying data is updated which results in incorrect data to users occasionally. This gets corrected overnight but till then for a certain query the data is incorrect. what can i do to fix it

    ReplyDelete
  3. My olap cache is not updated some times when the underlying data is updated which results in incorrect data to users occasionally. This gets corrected overnight but till then for a certain query the data is incorrect. what can i do to fix it

    ReplyDelete