At the crossroads: MOLAP vs HOLAP vs ROLAP - Part 2

by Nicolae Guse 20. November 2011 23:01

I've been doing some research lately regarding MOLAP, ROLAP and HOLAP storages, under Microsoft Analysis Services, and I’ve found 2 very interesting articles on Microsoft site, that I highly recommend:

Now, regarding the 3 major storage modes, let’s go into detail about the scenarios in which each of them is best suited, based on their implementation in Microsoft Analysis Services. Most of the characteristics listed here are generic, and match most implementations from various vendors, while others, like the management of ROLAP aggregations, are missing from other vendor products:

MOLAP

Advantages:

  1. Very good query performance by pre-aggregating data into the multi-dimensional structure, during cube process.
  2. Lower I/O due to the use of data compression, when transferred from RDBMS to the MOLAP storage
  3.  Analysis Services Engine can use Partition Slice to pre-determine, during execution plan phase, which cube partitions need to be queried in order to satisfy the user query.
  4. High independency from the relational database which holds the fact table and dimension tables. Since the relational database is only touched during cube process, the entire fact table can be re-created using non-logged operations like SELECT INTO, in order to speed-up ETL processes, without any impact over the users query run on the cube.
  5. The cube partitioning feature works excellently in tandem with the SQL Server table partitioning feature, which allows designing of high performance and elegant ETL solutions. Gone are the days when you had to create a FACT_SALES table for each day, now you can have a single FACT_SALES table, with partitions for each day.
  6. Incremental load can limit the performance problem occurred during large cube full process, by processing just the data which changed

More...

Tags:

Data Warehouse Best Practices

At the crossroads: MOLAP vs HOLAP vs ROLAP - Part 1

by Nicolae Guse 13. November 2011 22:55

Before going for a deep dive in the MOLAP vs HOLAP vs ROLAP discussion, you should be aware that not all BI platforms have been created equal. And, when talking about BI platform i mean both the Backend and the Frontend applications. There are very few BI platforms which allow you to choose the storage mode implemented, since most of them are designed around a single storage mode.

In fact, if we think of the major OLAP vendors, we have the following situation:

1.Microsoft Analysis Services - supports all major storage modes: MOLAP;HOLAP;ROLAP

2.SAP Business Objects - by default supports only ROLAP mode, implemented into the BO Universe designer. Using Voyager tool, you could connect, at least in theory, to OLAP cubes, both from Microsoft and from Oracle Hyperion. In practice, I’ve tried a few times and miserably failed to connect to a Microsoft Analysis Services 2005 64 bit version, from BO XI 3.1. And also in practice, even if you connect to an existing cube, you are able to access it into read-only mode, which means that no modifications performed in the universe can be reflected back on the cube.

3.IBM Cognos TM1 / Powerplay - MOLAP

4.Microstrategy - ROLAP

5.Oracle:

a.Oracle BI Enterprise Edition - ROLAP

b.Oracle Hyperion Essbase - MOLAP

The reality is that, when deciding to implement a new BI tool within the company, or to replace an existing one, the focus in rarely on the Backend Server, and much more often on the Frontend application, because, at the end, this is what users see. Recent studies in the BI domain showed consistently that users want tools which are intuitive and very easy to use. Also, consistently, the majority of the BI users are using parametrizable Standard Reports most of the time, and rarely use Ad-hoc reporting. This demonstrates that usability is often, together with the price, one of the strongest arguments to choose a specific BI solution.

More...

Tags:

Data Warehouse Best Practices

Choosing the right ETL

by Nicolae Guse 6. November 2011 19:51

Choosing your ETL solution is one of the most important decisions that you'll make, when you're starting a new Data Warehouse. Why is it so important? Because is like in the horse and carriage example that I’ve been talking in the Data Warehouse dictionary article. Where ETL is the horse and the Data Warehouse is the carriage.

By following this logic, you'll have to answer yourself: What kind of horse do I want? Well, that's an easy answer: The best one.....But what is this actually mean?

1. The Fairytale Kind: the fastest horse, who can run like the wind, no matter what distance and no matter how heavy the carriage is. And yes, of course, it's as cheap as you can afford it, typically a gift from the Fairy. And now you know why it's called the Fairytale Kind

2. The Real Life Kind: in here you'll see that you'll have much more options. But wait, having more options, isn't something good? I have a feeling that we are going to find out:

· The Pure Racer Kind: is the fastest runner you're going to find. It will integrate your data in the blink of and eye...Well, a bit more, but you get the picture. The problem with this type of horse is that is a specialized one. It will get you there very fast, but it will work only in a handful of scenarios. Which typically will mean for you that it's more likely to carry a light chariot than a heavy carriage. And, let’s not forget, racing always sound like expensive...

· The Mule Kind: but wait, this isn't quite a horse, right? You've guess it, it's half way between a donkey and a horse. And why would you want to go there? Well, first of all, because mules are very good on carrying heavy loads, over long distances and high slopes. It's definitely the slowest of all, and also the most stubborn one, when you need to add something new in the carriage, but, at the end of the day, will get you there. It's also typically the cheapest of all.

· The Workhorse Kind: is typically the one you'll want on your carriage. I'll say it's the best compromise between the Pure Racer and the Mule. It's not either the fastest or can carry the heaviest load, but will get you there in a fair amount of time, with a pretty heavy load. Also, it's the most versatile of all, and it can be modified fairly quickly to tow very different loads. And definitely what you'll want in a Data Warehouse is versatility. If you can afford the price, of course.
 More...

Tags:

Data Warehouse Best Practices

Implementing a Business Rule Engine

by Nicolae Guse 30. October 2011 23:26


While working on any Data Warehouse related project one of the things it's sure you're going to bounce on are Business Rules. Data Warehouses are build for Decision Support, and those decisions are based on indicators calculated based on a multitude of Business Rules, provided, surprisingly, by the business people from within your company.

Based on this logic, Business Rules are very important, and therefore is also very important how they are actually implemented at Data Warehouse level. Here we have the following likely scenario:

1.       Your company business evolves and the business people realize that they need to define a new set of business rules which will provide them a better level of granularity which will help them make better decision. Then follows a number of never-ending meetings, but, at the end, everyone agrees on them and now it's time to apply them at Data Warehouse level. And yes, they want to have these rules applied yesterday, because, you know the drill, it's business critical

2.       But wait, they cannot apply them by themselves, they need the BI Department to do it, since they are the ones which have created and are maintaining the Company's Data Warehouse. So, the business people are contacting the BI Department, which will dispatch a number of BI functional consultants that will analyze their business requirements and business rules and come up with a Functional BI specification. Which does take some time.

More...

Tags:

Data Warehouse Best Practices

The Name of the Game - Table Partitioning - Part 2

by Nicolae Guse 23. October 2011 19:17

Now that we've cleared up the main concepts associated to the Table Partitioning into the Part 1 of the article, let's make a list with the prerequisites necessary to apply the Partitioning Mechanism:

·         Partitioning Function - It contains the range of values which will be used for partitioning

·         Partitioning Schema - created based on the Partitioning Function, and used for creating partitioned tables. And obviously you can create more tables using the same partitioning schema

·         The table is created based on Partitioning Schema:

CREATE TABLE SALES

(               PartitionId int NOT NULL,

                Store varchar,

                MonthId datetime,

                Amount money

) ON PSCHEME_SALES(PartitionId)

 

And now, let’s talk a bit about the logic implemented by the Partitioning Mechanism:

More...

Tags:

Data Warehouse Best Practices

The Name of the Game - Table Partitioning - Part 1

by Nicolae Guse 16. October 2011 22:12

 

First things first: a pretty good presentation of the problematic to be addressed by Partitioning can be found in the article To Have or Not to Have a Physical data mart. You should read it, since it will become quite handy during this article.

As I’ve said in the previous article, one of the main challenges for a Data Warehouse which has a ROLAP Presentation Layer on top, is to be able to refresh the data while users are running their reports, and to also maintaining the data consistency.

Starting with SQL Server 2005, Microsoft introduced Table Partitioning Feature, on which we will be focusing during this article. Other database engines have implemented this functionality, so i'm pretty sure that the concepts laid here are pretty generic.

So what's that all about with table partitioning? Is it a magical cure or a royal pain in the ... Well, as always, I’ll say it's a little bit of both.

More...

Tags:

Data Warehouse Best Practices

To Have or Not to Have a Physical data mart

by Nicolae Guse 16. October 2011 21:51

Some time ago I’ve been involved in a quite heated debate with a BO Universe Designer regarding the presence of a Physical data mart. This really gave me the idea for this article.

First of all, lets compile a list of arguments against having a Physical data mart, when the Presentation Layer is using ROLAP technology:

  1. It adds an additional layer on top of the Data Warehouse, which slows down development process when you need to add a few new details at Presentation Layer level.
    1. Normally, as long as you don't have a Physical data mart, you end up with a growing snowflake design, where you can add all the tables you need, from all Data Warehouse layers, and, after performing the necessary joins, get the details you want very fast. And this approach removes the dependency, for every small development, between the Data Warehouse development team and the Reporting team and increases the speed in which the user request it's being solved
  2. It creates unnecessary data duplication by de-normalizing the data from the Actual Data Warehouse, mostly in order to generate the flat dimension tables required for development
  3. It moves away the design of the Presentation Layer from Conformed Dimension principle, in which dimension are shared between Data Marts, by focusing more on the creation of de-normalized dimension tables which are designed around each Data Mart specific needs. By contrast, the same REF table can be used within different Data Marts, since each one encapsulates an atomic business concept, as a result of the Normalization process.

Well, then were in a pickle, since there are pretty strong arguments here. Except that they are not the full picture of the problem. Which really made me think which are the arguments in favour of a Physical data mart?

More...

Tags:

Data Warehouse Best Practices

Back to basics - A Data Warehouse Dictionary

by Nicolae Guse 16. October 2011 21:19

 One of the things that i've learned so far is that we all talk different dialects, even when we are sharing the same language. One of the strongest challanges that i've meet so far was to make sure that, when we are talking about one concept, we are all talking about the same thing. And this happends beacuse we are different people, we have different ways to learn things, and we have different experiences.

Try to launch a discussion a Data Warehouse developer, a BI Business Analyst, a Project Manager and a BI Director and you will quickly see that, while talking about the same thing, as the Data Warehouse architecture, each of the one's about will understand something different. Therefore, in order to have any kind of progress, you'll have to define a kind of dictionary,  that (very unlikely) every one will agree on.

I have no intention to give by the book definitions as accurate as Kimball or Inmon, which are the parents of the Data Warehouse revolution. Their work should be read and appreciated, since without it, we would have nothing now to discuss about. They had a vision and they've fight about it, when everyone else was in the dark.

What i intend to do is to give definitions more related to the business processes that are generating them, more centered to the real life implementation of these concepts, that we see all around us.

More...

Tags:

Data Warehouse Dictionary

Starting with Classics - Why DWHBP.com

by Nicolae Guse 16. October 2011 21:06

 First of all, let's start with the classic questions:

 

What does this DWHBP.com stand for?

First answer would be " Delusional Writers Hope For Better Pay" . Is this it? Well, tempting, but no. It stands for "Data WareHouse Best Practices". It's a place of gathering and sharing ideas for people passionate about Data Warehouse and Business Intelligence in general. It also about a personal contribution for making the world around a better place.

Who i am and what I’m doing here?

The more generic answer would be that I’m, like you, one of the almost 7 billion people inhabiting this planet :) The more down to earth answer is that my name is Nicolae Guse , 32 years old and counting, and I’m a programmer. And yes, it does sound like an addiction. I’ve been a programmer for the last 10 years, and from those, more than 7 years I’ve spend it working in Business Intelligence, respectively developing Data Warehouse related projects, for one of the largest computer games manufacturers in the world.

During this period, I’ve gathered quite some information about how Data Warehouses should and should not be developed. I’ve also noticed that, like in other programming areas, having a good grasp on the concepts does count much more than actually writing code. Not that i don't like writing code, i actually love it, but I’ve noticed that writing a brilliant piece of code doesn't help you a lot if you don't get the bigger picture, and understand where the system is evolving. Otherwise is just a piece of code. 

I’ve taken the initiative of creating this site in order to share the things I’ve learned so far. The site focus will be more on BI and DWH concepts, and less on the actual technical solutions implemented. And this is because many similar concepts, can have very different technical solutions implemented, according to each company needs, and this is the natural way to go. And here comes the FIRST DISCLAMER:

This site is NOT a Bible for DWH Best Practices. It's a place of sharing DWH experiences and trying to find TOGETHER the best way of doing DWH projects. If you're searching for a Bible, and for eternal truths, then go to church.

What you will find on this site will be a series of articles on DWH topics, based on my experience so far, which is mostly, but not exclusively, related to the Microsoft ecosystem. Which doesn't mean that you're experience on other systems is not welcomed, quite the opposite. As I’ve said, is a place of sharing concepts, and i think that the best solutions are found when people are confronting different experiences in order to solve the same problems. And here comes the SECOND DISCLAMER:

I don't believe in "One ring to rule them all" approach. I think that different problems have different solutions, and that not all software solutions must come from a single vendor, since i don't know a single vendor whose best at doing everything. I consider that in the real world, different BI Solutions like, Microsoft Analysis Services, Oracle Essbase and Business Objects can happily coexist within the same company. If you're such a big fan of the "One ring to rule them all" approach, reading J.R.R Tolkien "Lord of the Rings" might help you reconsider.

And, for the moment, I’ll say it's just about enough with the introduction. I hope you'll enjoy these article series as much as I did when I’ve wrote them.

I hope you will debate them with the same passion as i did when i was debating them with my colleagues and friends during the projects I’ve been involved.

As last words before closing, i owe a big debt of gratitude to the authors of the Rework book, Jason Fried and David Heinmeier Hansson, for providing me the inspiration and energy to go ahead with this adventure. 

Cheers,

Nicolae

 

Tags:

General

Disclaimer

The ideas you'll find on this site represent the opinion of their owner. They are not either about or being endorsed by a specific company. This site is a place of sharing ideas and trying to find together best practices on Data Warehouse and Business Intelligence projects.

Some of the technical approaches will be praized, some of them will be criticized, based on the concepts presented by their owners. You shoudn't feel offended if some of the ideas which either you or your company consider best practices, here are being criticized. You should also realize that what some people consider best practices on a particular system and business scenario, can be worst practices on another.  If you don't agree with a particular idea, feel free to comment it in a civilized manner.

Only by crashing different ideas and experiences we will find a better way of doing Data Warehouse and Business Intelligence projects.