Before going for a deep dive in the MOLAP vs HOLAP vs ROLAP discussion, you should be aware that not all BI platforms have been created equal. And, when talking about BI platform i mean both the Backend and the Frontend applications. There are very few BI platforms which allow you to choose the storage mode implemented, since most of them are designed around a single storage mode.
In fact, if we think of the major OLAP vendors, we have the following situation:
1.Microsoft Analysis Services - supports all major storage modes: MOLAP;HOLAP;ROLAP
2.SAP Business Objects - by default supports only ROLAP mode, implemented into the BO Universe designer. Using Voyager tool, you could connect, at least in theory, to OLAP cubes, both from Microsoft and from Oracle Hyperion. In practice, I’ve tried a few times and miserably failed to connect to a Microsoft Analysis Services 2005 64 bit version, from BO XI 3.1. And also in practice, even if you connect to an existing cube, you are able to access it into read-only mode, which means that no modifications performed in the universe can be reflected back on the cube.
3.IBM Cognos TM1 / Powerplay - MOLAP
4.Microstrategy - ROLAP
5.Oracle:
a.Oracle BI Enterprise Edition - ROLAP
b.Oracle Hyperion Essbase - MOLAP
The reality is that, when deciding to implement a new BI tool within the company, or to replace an existing one, the focus in rarely on the Backend Server, and much more often on the Frontend application, because, at the end, this is what users see. Recent studies in the BI domain showed consistently that users want tools which are intuitive and very easy to use. Also, consistently, the majority of the BI users are using parametrizable Standard Reports most of the time, and rarely use Ad-hoc reporting. This demonstrates that usability is often, together with the price, one of the strongest arguments to choose a specific BI solution.
In fact, if I were to make a short list of user request for a BI platform, it would look like this:
·Usability - the application must come with a simple, intuitive and powerful graphical interface that should encourage the user to discover the data from the BI system. In fact, the system should be really transparent to the user, implement a very rapid learning curve, and allow him to be focused on discovering new ways to solve everyday problems, rather than constantly fighting with the tool
·Rapidly consolidate data from different data sources - the reality of an information system within a company is that is very heterogeneous. Depending on your needs, you may have different BI platforms implemented within your company: Microsoft Analysis Services; Hyperion Essbase; Business Objects. The users want to have a single tool, which is able to connect to all these sources, plus some of their own (ex. Excel files), and allow them to build a report were they can put together data from all these systems, in order to get the best added value.
·Speed - we live in a dynamic world, were things are constantly on the move. Your BI system users have all sort of work to do, and your system must allow them to get the information they need, when they need it. Which means that no one wants to spend minutes or more in order to wait for a report to refresh. And if that report is really part of their daily work, then you'll risk to have some very upset users.
·Data Quality - users will always expect from the BI system the most correct data. I'll say it's a bit like buying a flight ticket, you're looking at the price and at what kind of services it provides. You don't even think at safety, because you assume that every company puts safety at the highest priority. Likewise, most of the time the users don't even say it, because it's so normal for them to have reliable data.
As a BI team, you must make sure that you implement the right methodology in order to make sure that the data from you Data Marts matches the one from the Frontend application. Because user trust is very easy to lose, and very hard to earn back.
The IT job will be to ensure that clear performance indicators have been defined, and test scenarios are implemented in order to evaluate that, beside usability, the candidates BI platforms can really deliver the necessary performance. Since, at the end, a BI platform is not just a work of art, to be admired in a museum, and not to be touched, but, more likely, a very important tool which will be stressed to its limits in order to improve the company performance.
Choosing the right OLAP storage model is a critical design choice for the Backend component of your Presentation layer. It will have a deep impact over:
1.Query Speed - how fast the system is able to respond to user requests
2.System scalability - how easy is the system to scale up in order to handle increased data volumes
3.Data Latency - how much time does it take for a modification which is performed into a source system to be reflected into the BI Presentation Layer. The data latency is influences mainly by the following factors:
a.Data Warehouse ETL processes:
i.How the data refresh is triggered at Data Warehouse level:
1.Is it a scheduled job which runs on a designated schedule
2.The refresh process is launched automatically as soon as new data reaches the source application CDC tables
ii.How much time does it takes to import and process the data until the Data Mart layer
In here, it doesn't really matter which OLAP Storage mode you choose for the Presentation layer, since it has no influence. What does matter is the fact that the users perceive only the overall latency between their modification and when it's accessible in the presentation layer.
b.The amount of processing necessary to load the Data Mart data into the Presentation Layer:
i.ROLAP (Relational OLAP)-both fact data and aggregations are hold into a typical relational database, which is in fact the Data Mart database. Minimal processing time is necessary to process the system aggregate tables (when available), but even during this processing step the users are able to use the Presentation Layer in order to access the latest fact table data.
ii.HOLAP (Hybrid OLAP)- the fact data is hold into a relational database, while aggregations are created into a multi-dimensional database. Some amount of processing is necessary to calculate the aggregations using a multi-dimensional storage, used to speed up the user query.
iii.MOLAP (Multi-dimensional OLAP) - both data and aggregations are stored into a multi-dimensional database. High amount of processing is necessary in order to transfer all the Data Mart data from the relational to the multi-dimensional database, including the aggregation calculation.
Before going deeper into the differences between the 3 main OLAP storage models, we should make a step back, and start with the end, respectively with your BI System users.
First things first, you should be able to determine which are your main category of users and how much do they count. The theory says that BI systems are build for decision support, so the main users of your systems are the people which take decisions, like managers, right? Well, not so right, because in your company it's likely that you'll have more diverse category of users:
1.Managers - they are the ones which requested the whole BI system, and provided the IT department with the money to build it. Sounds pretty important, isn't it?
a.Definitely the one category you'll have to keep satisfied.
b.They'll need highly aggregated information, typically on a short list of very important KPI (ex. Sales, Cost, Profitability / Country), and prefer Graphical Visualization. In here, Scorecards and Dashboards are the name of the game.
c.Sometimes they build their own reports but, most of the times, they are either usingStandard Reports or request a specific type of report and that report is typically delivered for them by the Analysts
2.Analysts - are the ones which are analyzing the existing data within the BI system in order to satisfy management information request and discover new trends within your company business. Think Data Mining here and you've really found your team. They are the people who really have a good overall image about your company business and each department activity. They are also the ones building Standard Reports, which contain the official vision of the company over a particular or several business domains.
They often have to perform complex analysis by combining data from different data sources:
a.Official BI system - The systems build and maintained by the BI department, which offers a single version of the truth over the company's business. Everything in here is the result of a BI project, has clear specifications defined and has been heavily tested by both the IT and the business users before reaching the final system.
b.The Shadow BI system:
i.All sorts of Excel files containing complex calculations, generated within the company by either Analysts or Power Users, but not officially controlled or validated by the IT department.
ii.External Data Sources - basically you'll have here all sorts of documents you'll need to do your work, from various external sources (ex. Search Engines)
There is a lot of power and a lot of potential danger coming from this Shadow BI system:
·As much as IT would like, it will never have everything that you need to do your job inside the Official BI system, because it involves the IT department and a budget to add anything into the system. And this means changes are implemented SLOW. While your business needs are evolving VERY FAST.
·People will always find ways to move around the Official BI system, in order to do their job. It doesn't matter how much you try to regulate things, you'll always have a Shadow system, and this is critical for your business to continue to grow. These un-official projects are the typical nursery for future official projects to come. So don't try to kill them.
·The dangers come from the fact that the data from these files is not validated by the IT system, and rarely validated by someone else other than their creator. This is ok, as long you don’t start to take any critical decissions based on them. As a BI department you should implement a methodology which allows detailed tracking how much of the data used to take decisions comes from the Official System and how much comes from the Shadow System. As soon as the information from the Shadow System starts to become business critical, should be moved to the Official System.
3.Power Users - they are the ones knowing best the business within a specific department, where they typically activate. They are building department specific reports which are shared with the End Users.
4.End Users - they are mostly using the Standard Reports, with some minimal filtering (ex. The Inventory for a specific day and Store). Either way you put it, in here you will have very detailed data, with minimal or no data aggregation performed. And yes, I do know that this doesn't sound much like Business Intelligence, and more like Operational Reporting, but a lot of times this ends up under the BI department umbrella, so you'll have to deal with it.
And here comes the BIG question. Who do you want to satisfy? The easy answer would be EVERYONE. Really? The only problem is that this is, in practice very, very difficult to do. Mainly because, as you've seen above, you have internal customers working at very difficult level of aggregation. The top managers want the see highly aggregated information’s like Profitability, while the end users want to see the information enriched with the Data Warehouse business rules, but detailed at the invoice line level. And either way you put it, they are very incompatible needs.
Well then, if it's so difficult to satisfy both extremes, than we should go first for satisfying the Top Managers. They are the one's paying our salaries at least, so that alone it's sure to put them high on the priority list. But then, what you'll do if the structure of your users looks like this:
·Management - 5%
·Analysts - 10%
·Power Users & End Users - 85%
If you design your system to provide very fast response, for highly aggregated queries, you will:
1.Satisfy the Management needs. They'll be very happy to have the data they want very fast, and displayed in a very pretty graphical format. As long as they are able to interact very intuitively with the data, by using something like just hand gestures captured with something like KINECT or by using IPAD.Thought control is next. And when they'll need some new reports, well, no problem, they'll send requests to the Analysts, and expect it back in no time.
2.You'll let the Analysts somehow in the gray area of satisfaction, since they need both highly aggregated and detailed data. They'll be delighted at first by the performance on the aggregated reports. But, since we are humans, they'll quickly forget the part which works very well and get really frustrated by the part that's very slow, which is the detailed data part.
3.You'll have the Power Users & End Users very frustrated by the performance of the BI Reporting System for their detailed reports, which contain a lot of detailed data. They are paid to do their job, and they cannot do their job when, each time they change a filter value, takes ages to refresh their reports. And you can really afford to have between 85 and 95% of your users dissatisfied by the BI Department work, without having any consequences? How much time you'll think it will take for those complaints to reach the management?
If you design your system to provide good response for detailed data, you will typically:
1.Have a big performance problem on the aggregated queries, since these aggregates will have to be calculated on the fly. Which, when processing hundreds of millions on rows and dozens of gigabytes of data, can take a VERY BIG amount of time
2.Upset the main recipients of highly aggregated reports, which is the Top Management. They are the ones paying your salaries, remember?
3.Find it's very difficult to optimize a very detailed query for performance since, either way you put it, involves massive disk I/O, RAM and big network bandwidth. Either way, the question which should be answered here is if the users really need reports with hundreds of thousands or millions of lines of data. Is it really possible to take decisions when using such large result sets?
And now you understand why the simplest questions are typically the hardest to answer.
In the second part of the article we will focus more on the advantages and disadvantages of the MOLAP, ROLAP and HOLAP storages.
Until then, back to you. Which are the BI Platforms used in your organization? Based on your experience, which storage mode would you chose for a BI project? Is there a storage mode which magically fixes all the problems or you prefer to choose the storage made based on the particularities of each project?