Custom Row Versioning - Part 2

by Nicolae Guse 15. August 2014 18:16

Together we've discovered in  Custom Row Versioning - Part 1 what were the main challenges we were facing in  order to address the very hot topic of the concurrent data access. Now we will focus on the technical solution that we've implemented for SQL Server 2008 R2 Database Engine. However, as you will see, the solution is pretty straight forward and a similar approach could be applied under different database engines. 

What I've realized while working on this data concurrency problem is that the final solution will involve a Row Versioning solution that would have to work with Partition Switch approach. This meant that we should be able to handle any INSERT/UPDATE/DELETE operations that would be reflected in the final table, as a result of the table Partition Switch, without:

- being blocked by an ongoing user query on that FACT / DIMENSION table

- blocking any new user reporting launched while the refresh operation was in progress or completing



Data Warehouse Best Practices

Custom Row Versioning - Part 1

by Nicolae Guse 13. August 2014 14:30


I'll never stop being amazed by the importance of stepping outside of your comfort zone. Even if most of us don't like doing this, often without realizing, once you start having that warm feeling of comfort, the technical progress starts slowing down and, sooner or later, you're bound to be surprised of what's just around the corner. But let’s dig into  this and you'll better understand what I mean.


A few years back we were using Microsoft Analysis Services as a core backbone component of our Business Intelligence system, on top of a Data Warehouse build on Microsoft SQL Server Database Engine. We were using MOLAP cubes containing aggregated data, build especially for decision making. And for those users needing to perform operational reporting, we were building an SQL Server database, containing all details they needed. On top of both worlds we were using an in-house build reporting solution, based on Office Web Components. Perhaps not the most versatile reporting solution, but it was getting the job done.


And I would have said that everyone lived happily ever after but it wasn't quite so: some of the users were finding the custom reporting solution subpar, compared to other reporting solutions from the market. As time went by, and while using the tool in more and more complex ways, some of the customers found the lack of a more advance user interface a very serious limitation. In addition to this, Microsoft stopped their support for Office Web Components, which was mainly designed for Analysis Services 2000. Since back them we've migrated to Analysis Services 2005, already some of the more advance features were not supported.




Data Warehouse Best Practices

Microsoft Analysis Services Tips and Tricks

by Nicolae Guse 10. February 2013 15:02

Recently I've encountered some pretty nasty errors in Microsoft Analysis Services 2008, which gave me the inspiration for this article. It's about "Operation has been cancelled due to memory pressure" and sometimes its close companion “The lock operation ended unsuccessfully because of deadlock”. And they are exactly the kind of errors which really makes you think "how deep the rabbit hole goes".

Well, you'd say, what's more to dig into this? Isn't it obvious that the server doesn't have enough memory? Indeed, but before rushing to buy some more memory, and hope that everyone lived happily ever after, I'll say that's pretty useful to understand the various scenarios which may take there. Since not always buying more RAM will solve your problems.

First things first. These errors are usually the last symptoms of more complex problems occurring in the background, so you should really spend some time in identifying the actual scenarios which lead to these problem. For example, is one thing to start getting them during User Querying and a totally different thing when you are performing a dimension processing using ProcessUpdate option. And here is why:



Data Warehouse Best Practices

Data Warehouse Atomic Level

by Nicolae Guse 1. September 2012 11:43

There is an old saying which states "You can't unscramble scrambled eggs". Surprisingly, the same thing applies also to Data Warehouse Atomic level.

Before taking a deep dive into the subject, we should clarify what this Atomic Level stands for: it means adding in the fact table the finest level of granularity available in the Source application, in order to allow full tracing from the Data Warehouse back to originating data source.

I think here is the moment where we should make a step back, and start thinking how  Data Warehouse appeared in the first place.

Usually, it all starts with the business request coming from one department, which asks for for a report containing a certain information. Slowly these report requests start to pile up and they form the foundation for what will become the company  Data Warehouse. From technical point of view, very different paths are available:

1.      Design the fact tables just to contain information for the dimensions that the business user requested. Everything related to having the atomic level into the fact table is additional work, which at the end translates into additional costs that the users won't necessary agree to pay, especially if the disadvantages aren't clearly stated or if there is huge business pressure to deliver the end result very fast. You typically end up in this scenario when the IT department starting the  Data Warehouse doesn't have enough "weight" or experience in order to really influence the end solution, and you end up having the customer dictating the technical solution. Which may come back to hunt you later on.



Data Warehouse Best Practices

Data Warehouse Coding Guidelines and More - Part 2

by Nicolae Guse 15. January 2012 11:57

In the first part of the article we’ve talked quite a bit about Data Warehouse Coding Guidelines. We all go thru a love and hate relationship with rules in general, and Coding Guidelines are no exception. But the whole point is that there are really necessary, if you want to have a consistent Data Warehouse. Having consistent coding rules is just a part of the solution, since, in order to ensure consistency, elements like ETL (in our case SSIS) development rules, Source Control and an overall BI Department Development Strategy come into place. Always remember that a chain as strong as the weakest link, and Data Warehouses make no exception. And now, let’s go into detail about each one of the above components:


1.       Implement extensive usage of Configuration Files and Tables. As long as you have different environments to manage, one of the worst ideas is to have the package connection strings containing hard-coded values, since this will involve manual modifications before each deploy on a new environment. Instead, you should go for the following approach:

a.       Use a configuration file or an environment variable which contains the connection string to the server which contains the SSIS Configuration table corresponding to that environment. This configuration will be used when you launch the SSIS package

b.      In order to initialize all connection managers declared at package level, we should use the SSIS configuration table, which contains the values corresponding to that environment. In this way, each time we need to recover the value of a connection manager or variable we should add just a new line into this table. The SSIS configuration table can hold values for different SSIS packages, and therefore greatly improves the flexibility of the solution.



Data Warehouse Best Practices

Data Warehouse Coding Guidelines and More - Part 1

by Nicolae Guse 6. January 2012 16:19

 Since I’ve started the series of articles dedicated to Data Warehouse Best Practices, we've been talking a lot about focusing on reusable components like Partitioning Mechanism, Implementing a Business Rule Engine or a Generic Import Mechanism. And since the most reusable component is the code itself, I thought it's a good idea to prepare an article dedicated to the Coding Guidelines to be used for Data Warehouse solutions based on Microsoft SQL Server.

We will start up discussing about SQL Server Developments in the Part 1 of the article, continue in the Part 2 with SSIS Developments and Source Control, and end up with Development Strategy for a Business Intelligence department. 


First things first. There are quite a few SQL Server Coding Guidelines out there, and one of the best I’ve come around is the SQLServerGuidelines, written by Pinal Dave. I highly suggest you to read it, since it contains a lot of valuable information for a developer.

Beside these excellent guidelines, which are applicable to any SQL Server development, I’ve tried to add some of my own, based on my experience so far, and more specific to Data Warehouse developments based on Microsoft SQL Server.

1.       Normalize database structure in the 3rd normalized form, at least for Actual Data Warehouse database. This approach allows greater flexibility when needed to regroup data at Data Mart level. For the Data Mart level, the key to obtaining better performance is de-normalization, since it greatly reduces number of joins to be performed, especially when using a ROLAP Presentation Layer.


2.       Avoid using INSERT INTO / SELECT * in query’s. Instead, use in the SELECT statement specific column names. In this way we avoid any unintended impact caused by adding new column in the source table.


3.       Avoid using INSERT statement without a column list, since could cause significant problems when new columns are added or the order of the existing columns is changed in the table.



Data Warehouse Best Practices

What is Master Data Management?

by Nicolae Guse 18. December 2011 22:54

Before going to the Master Data Management topic in detail, i think we should make a step back and try to understand how the concept of Master Data appeared, and where does it fit inside a company IT infrastructure.

Since I’ve always liked a good story, let’s think a bit to the following one:

·         There was once a small company, driven by a few enthusiastic people who really believe that can make a great product which can really have an impact. At the beginning, the IT system for the company was very simple, and covering just their basic needs, and allowed them to move fast on the market, since it had to cover just a few scenarios.

Since money was a scares commodity at the beginning, part of the IT tools used were freeware products, part of them bought and part of them build by the very people that put the company together. It's that stage where everyone is doing everything, in order to kick-start the whole business. This means that, at this initial stage, the IT system doesn't have to be perfect, but only good enough, just the bare necessities which you strictly need to cover in order to build and sale a great product.



Data Warehouse Best Practices

Who needs a Data Warehouse Administrative Dashboard?

by Nicolae Guse 11. December 2011 22:31

Having a perfectly running Data Warehouse is not an easy job. It’s a mix of hardware and software challenges to address, spiced with a lot of data verifications in order to make sure that the system is not only up and running, but it also delivers the expected level of performance and data quality. 

On the hardware and software challenges, many of them are related on having the system operational no matter what. Let’s try to think at some of the scenarios a Data Warehouse has to handle on a daily basis, and why is it so hard to have it running perfectly:

1.       Because the whole system is usually very complex, and integrates data from multiple sources. If one of these data sources, which are spread over a wide geographical area, has a problem, you will have a problem at Data Warehouse level. And there are a lot of things to go wrong here, from power and hardware failures, networking up to software problems and crashed billing processes. None of them are under the control of the BI Department control, but any of them will have an impact over the Data Warehouse, when they are used as active Data Sources

2.       Going back to the Data Warehouse, it can also be affected by the same problems which affect the source applications. And this is why the whole Data Warehouse system must be build with redundancy in mind:

a.       Implement the use of Network Aliases, for each main component of the Data Warehouse. These aliases will point to either the Main or the Secondary servers, and will allow the applications which are based on them to continue operation, even if one of them goes down. Your should define such aliases for each main component of the Data Warehouse:

·         The servers which are hosting the Data Warehouse databases

·         The ETL servers

·         The Presentation Layer servers

All processes should take place on the Main server for each component. A Secondary server should always wait on standby, synchronized with the Main one by using features like Mirroring, Replication and Log Shipping. If something goes wrong with the Main server, and it cannot be contacted anymore by a Witness server, than the Network Alias will automatically be re-routed to the Secondary one for the affected component.

3.       Hardware resources are slowly overwhelmed by increased volumes of data. You should be able both predict and to trace the evolution of such trends before they turn into a problem. And this is where the term monitoring and verifications comes into place.  

4.       And the list can continue....



Data Warehouse Best Practices

Implementing a Generic Import Solution

by Nicolae Guse 4. December 2011 18:16

Building a Generic Import Solution in order to feed your Data Warehouse is an essential step in order to provide a solid foundation on which current and future Business Intelligence projects are being build. Such a component will add to other generic components like Business Rule Engine and Partitioning Mechanism, and will allow the Development phase of a Business Intelligence project to be focused on the specificities of each project, rather than spending each time massive amount of times in order to re-invent the wheel.

Developing a generic component will always take more time than implementing a quick fix approach, but, at the end, such a component does offer a much higher Return of Investment by:

·         Saving a lot of  time in future developments

·         Streamlining the whole Data Warehouse architecture

The only catch of this approach based on Generic Components is that it involves a Data Warehouse Architecture which is outside the scope of a regular Data Warehouse business project. Such a project is always business driven, and users will never ask for things like Partitioning Mechanism, for example. That means that it's under direct responsibility of the Business Intelligence department to take such initiatives, and to get the support of the upper management in order to build the technical foundation which will form the future backbone of the Data Warehouse. Which means that, in short and mid-term, you won't be able to make as many business projects as the business owners would like, which can make the Business Intelligence department somehow unpopular.

But I’m pretty sure that having a solid technical foundation will benefit a lot all Data Warehouse projects, in mid and long-term. As long as you are able to explain the business users on your intentions and the upcoming advantages, I think you'll have a win here. And this is necessary because the Data Warehouse Architecture scope stretches far beyond a regular Data Warehouse business project, which means that, at the end, a Data Warehouse is something more than the sum of all Data Marts.



Data Warehouse Best Practices

Real Time Data Warehouse - Is it really possible and do we really need it?

by Nicolae Guse 27. November 2011 15:49

Real Time Data Warehouse it's one of the concepts that's been bugging me from quite some time. Partially because I keep hearing it from various software vendors, that are praising the performance of their Database, ETL or Presentation Layer tool. And partially because I’m really wondering if, in the real and complex IT infrastructure of a company, is it really possible. And also if it's really necessary.

Before going forward on this concept, I think it's very necessary to understand what real time and critical means. I'll take as an example the Electronic Stability Program (ESP) implemented as an active security system for car safety. Where active means it will help you preventing a crash. The ESP uses sensors coupled on each wheel, steering wheel and a gyroscope in order to determine if the car is moving in the direction in which the driver intends to. If detects that the car is skidding out of control, it applies individual braking on each wheel, and reduces the engine power, in order to bring the car to the trajectory the driver intended. In this way, the crash is prevented. And this is where you need Real time decisions, because anything less would mean a crash, which would result in a life and death scenario. And it works so good since it uses just a few data, collected in real time, by using components designed only for this purpose.



Data Warehouse Best Practices


The ideas you'll find on this site represent the opinion of their owner. They are not either about or being endorsed by a specific company. This site is a place of sharing ideas and trying to find together best practices on Data Warehouse and Business Intelligence projects.

Some of the technical approaches will be praized, some of them will be criticized, based on the concepts presented by their owners. You shoudn't feel offended if some of the ideas which either you or your company consider best practices, here are being criticized. You should also realize that what some people consider best practices on a particular system and business scenario, can be worst practices on another.  If you don't agree with a particular idea, feel free to comment it in a civilized manner.

Only by crashing different ideas and experiences we will find a better way of doing Data Warehouse and Business Intelligence projects.