With
Data Warehouses around the world growing rapidly every day, the ability
of a Data Warehousing solution to handle mass-data, thus allowing for
the ever-shrinking time-windows for data loads is fundamental to most
systems.
BW 7.3 recognizes the “need of the hour” with several
performance related features and in this blog, I will discuss the
performance features related to data loads in SAP BW 7.3, focusing
mainly on Master Data Loads and DTP Processing.
Here is the list of features discussed addressed in this blog -
Master Data
Master Data
- Mass Lookups during Master Data Loads
- The “Insert-Only” flag for Master Data Loads.
- The new Master Data Deletion
- SID Handling
- Use of Navigational Attributes as source fields in Transformations.
DTP Processing
- Repackaging small packages into optimal sizes.
Master Data
1. Mass Lookups during Master Data Load
Data
loads into a Master Data bearing Characteristic require database
look-ups to find out if records exist on the database with the same key
as the ones being loaded. In releases prior to SAP BW 7.3, this
operation was performed record-wise, i.e. for every record in the
data-package, a SELECT was executed on the database table(s). Obviously,
this resulted in a lot of communication overhead between the SAP
Application Server and the Database Server, thereby slowing the Master
Data loads down. The effect is pronounced on data loads involving large
data volumes.
The issue of overhead between the SAP Application
Server and the Database Server has now been addressed by performing a
mass-lookup on the database so that all records in the data-package are
looked-up in one attempt. Depending on the DB platform it can bring
up-to 50% gain in load runtimes.
2. The ‘Insert-Only Flag’ for Master Data Loads
- Starting NW 7.30 SP03, this flag will be renamed to – “New Records Only”. The renaming has been done to align with a similar feature supported by activation of DSO data. (See blog Performance Improvements for DataStore Objects )
As
mentioned above, the Master Data Load performs a look-up on the
database for every data-package to ascertain which key values
already exist on the database. Based on this information, the Master
Data load executes UPDATEs (for records with the same key already
existing in the table) or INSERTs (for records that don’t exist) on the
database.
With the ‘Insert-Only’ feature for Master Data loads
using DTPs, users have the opportunity to completely skip the look-up
step, if it is already known that the data is being loaded for the first
time. Obviously, this feature is most relevant when performing initial
Master Data loads. Nevertheless, this flag can also be useful for some
delta loads where it is known that the data being loaded is completely
new.
Lab tests for initial Master Data loads indicate around 20% reduction in runtime with this feature.
The
‘Insert-Only’ setting for DTPs loading Master Data can be found in the
DTP Maintenance screen under the ‘UPDATE’ tab as shown below.
Note :
- If the ‘Insert-Only’ flag is set, and data is found to exist on the database, the DTP request will abort. To recover from this error, the user simply needs to uncheck the flag and re- execute the DTP.
3. The New Master Data Deletion
Deleting
MasterData in BW has always been a performance intensive operation. The
reason being that before any MasterData can be physically deleted, the
entire system (Transaction Data, Master Data, and Hierarchies etc) is
scanned for usages. Therefore, if a lot of MasterData is to be deleted,
it takes some time to establish the data that is delete-able (i.e., has
no usages) and data that is not (has usages). In addition, with the
classical MasterData Deletion involving large data volumes, users
sometimes ran into memory overflow dumps.
To address these issues,
the Master Data Deletion was completely re-engineered. The result is
the New Master Data Deletion. In addition to being much faster than the
classical version, the new Master Data deletion offers interesting new
features like Search-modes for the usage check, Simulation-mode etc. The
screen shot below shows the user interface for the new Masterdata
Deletion when accessed via the context menu of InfoObjects in the
DataWarehousing Workbench.
Although
the new Master Data Deletion has be available for some time now (since
BW 7.00 SP 23), it was never the default version in the system. This
implied that the BW System Administrators needed to switch it ON
explicitly. With BW 7.30 however, the New Master Data Deletion is the
default version and no further customizing is necessary to use it.
All further information about this functionality is documented in the SAP note:1370848 under https://websmp130.sap-ag.de/sap(bD1lbiZjPTAwMQ==)/bc/bsp/spn/sapnotes/index2.htm?numm=1370848
It can also be found in the standard SAP BW documentation under http://help.sap.com/saphelp_nw73/helpdata/en/4a/373cc45e291c67e10000000a42189c/frameset.htm
4. SID Handling
This
feature relates to the handling of SIDs in the SAP BW system and while
it is certainly relevant for Master Data loads, it is not restricted to
it. The performance improvements in SID handling are relevant for all
areas of SAP BW where SIDs are determined, for example – Activation of
DSO Requests, InfoCube Loads, Hierarchy Loads and in some cases, even
Query processing.
In BW 7.30, SIDs are determined en-masse’
meaning that database SELECTs and INSERTs that were done record-wise
previously have been changed to the mass SELECTs (using the ABAP SELECT
FOR ALL ENTRIES construct) and mass INSERTs. The system switches to this
mass-data processing mode automatically when the number of SIDs to be
determined is greater than a threshold value. The default value of this
threshold is 500.
The threshold value is customizable
of course and that can be done in the SAP IMG for customizing under the
transaction SPRO by following the path: SAP Netweaver -> Business
Warehouse -> Performance Settings -> Optimize SID-Determination
for MPP-Databases.
Note: As the threshold value corresponds to the minimum number of SIDs to be determined in one step, setting the threshold to a very high value (For example: 100000) causes the system the system to switch back to the classical behavior.
5. Use of Navigational Attributes as source fields in Transformations
Quite
often there are scenarios in SAP BW where data being loaded from a
source to a target needs to be augmented with information that is looked
up from Masterdata of Infoobjects. For instance - loading sales data
from a source that contains data on Material level to a DataTarget where
queries require the sales data to be aggregated by Material Group. In
such cases, the Master Data Lookup rule-type in Transformations is used
to determine the Material Group for any given Material (given that
MaterialGroup is an attribute of Material).
Although the
performance of the Masterdata Lookup rule-type has been optimized in
earlier versions of BW (starting BW 7.0), there is an alternative to
this rule-type in BW 7.30. Now, navigational attributes of Infoobjects
are available as source fields in Transformations. The benefits of this
feature are two-pronged.
- The fact that the data from the navigational attributes is available as part of the source structure allows the data to be used in custom logic in Transformations (example : Start Routines).
- Secondly, the data from the navigational attributes is read by performing database joins with the corresponding Masterdata tables during extraction. This helps in improving the performance of scenarios where a lot of look-ups are needed and/or a lot of data is to be looked-up.
To use this
feature in Transformations, the navigational attributes need to be
switched ON in the source InfoProvider in the InfoProvider maintenance
screen as below -
Once
this is done, the selected navigational attributes are available
as part of the source structure of Transformations as shown below –
Data Transfer Process (DTP)
1. Repackaging small packages into optimal sizes
This
feature of the DTP is used to combine several data packages in a source
object into one data package for the DataTarget. This feature helps
speed up request processing when the source object contains a large
number of very small data packages.
This is usually the case when
memory limitations in the source systems (for example: an SAP ERP
system) results in very small data-packages in the PSA tables in BW.
This DTP setting can be used to propagate the data to subsequent layers
in BW in larger chunks.
Also, InfoProviders in BW used for
operational reporting using Real-time Data Acquisition contain very
small data packages. Typically, this data is propagated within the
DataWarehouse into other InfoProviders for strategic reporting. Such
scenarios are also a use-case for this feature where data can be
propagated in larger packets.
As a prerequisite, the processing
mode for the DTP needs to be set to ‘Parallel Extraction and Parallel
Processing’. Also note that only source packages belonging to the same
request are grouped into one target package.
Below is a screenshot of the feature in the DTP Maintenance.
DTP - Optimal Package Size
No comments:
Post a Comment