Custom Row Versioning - Part 1

by Nicolae Guse 13. August 2014 14:30

 

I'll never stop being amazed by the importance of stepping outside of your comfort zone. Even if most of us don't like doing this, often without realizing, once you start having that warm feeling of comfort, the technical progress starts slowing down and, sooner or later, you're bound to be surprised of what's just around the corner. But let’s dig into  this and you'll better understand what I mean.

 

A few years back we were using Microsoft Analysis Services as a core backbone component of our Business Intelligence system, on top of a Data Warehouse build on Microsoft SQL Server Database Engine. We were using MOLAP cubes containing aggregated data, build especially for decision making. And for those users needing to perform operational reporting, we were building an SQL Server database, containing all details they needed. On top of both worlds we were using an in-house build reporting solution, based on Office Web Components. Perhaps not the most versatile reporting solution, but it was getting the job done.

 

And I would have said that everyone lived happily ever after but it wasn't quite so: some of the users were finding the custom reporting solution subpar, compared to other reporting solutions from the market. As time went by, and while using the tool in more and more complex ways, some of the customers found the lack of a more advance user interface a very serious limitation. In addition to this, Microsoft stopped their support for Office Web Components, which was mainly designed for Analysis Services 2000. Since back them we've migrated to Analysis Services 2005, already some of the more advance features were not supported.

 

 

And this is how the staged was set for the selection of a new BI reporting tool, and that tool was Business Objects. Much friendlier than our existing reporting solution, the new tool was perceived by our users as a big step forward, which was inviting them to perform even more data exploration. But, on the back-end side, this also meant that we were also moving away from Microsoft Analysis Services, which had some pretty poor support in Business Objects back, at least back in 2009. 

 

The technical solution to be applied, which offered the desired level of flexibility for the users, was to start building fully blown Business Objects universes, based on a semantic layer which was querying directly a physical database. And here is where our problems started to arise:

 

1.     While using a MOLAP cube build in Microsoft Analysis Services, we were in fact using an intermediate layer, dedicated for reporting, between the user and the Data Warehouse. While the cube gets processed, data is loaded from the Data Warehouse into a dedicated storage, and aggregations were created, in order to speed-up user query.

 

2.     On the performance side, using various aggregations patterns, based on user usage, coupled with automated creation of new partitions, which were including into the partition definition filters on the most used search criteria (ex. time), were ensuring a very good level of performance during user reporting

 

3.     As  long as you had a MOLAP cube on-top, any failure occurring during the ETL load of the Data Warehouse had no impact over the user data, since cube processing was done just in case of success. A simple but effective approach

 

4.     Now, making one step further, we've quickly started realizing that our existing best practices for Analysis Services were no longer a match for Business Objects:

 

a.     With Business Objects primarily a ROLAP solution, any failure occurring on the Data Mart database load means that the user’s data was impacted, which didn't happen before. So this meant on the technical side a lot more focus on the Error Handling side

 

b.     Moreover, on the ETL side, we were traditionally using DROP / SELECT INTO approach with SQL Server 2005, in order to quickly load data without the logging overhead. Which obviously couldn’t be used anymore, as long as you were constantly having users querying your Fact table via Business Objects?!

 

5.     Additionally, we had increasing pressure from the business side to have ever smaller intervals for refreshing the data in the Data Warehouse, for critical business modules. For some of them we've already had scheduled jobs which were refreshing data every 15 min, and the perspective was to be required to go even lower than this. Overall, an increasing tough nut to crack, even before having Business Objects on top.

 

6.    Having a quite high number of users able to perform ad-hoc reporting was a starting to severely limit our options to refresh the data, since you couldn't have a guaranteed data refresh window. And considering killing every Business Objects connection on the Data Mart during refresh was out of discussion with refreshes as often as 15 min

 

As seen above, as developers we were suddenly facing new and complex challenges that we needed to address. This definitely pushed us out of the comfort zone, and forces us to identify proper solutions to these challenges:

 

1.     We knew that we will have to give up the approach based on DROP / SELECT INTO. And we knew that SQL Server 2008 was coming with Bulk Insert options that could help us here. However, due to existing indexing policy, this option could not be directly applied on the Data Mart tables. More details about the limitations of the SELECT INTO approach can be found into the article Data Warehouse Coding Guidelines and More - Part 1
 

 

2.     We've considered using explicit transactions to encapsulate the INSERT / UPDATE / DELETE operations that were needed to be performed on the final FACT table. Only to discover that it's a performance killer, especially when needing to process millions of records in a single transaction, with users on top, that were actively querying the data you were trying to modify. So NO GO also on this approach.

 

3.      After building various prototypes, we've concluded that the best approach for refreshing the data is to use Partition Switching method. Basically, we were creating a TMP table as an empty clone of the Final table, which meant that we could refresh the data on the ETL side without interaction from the user during this critical part of the process. This whole new approach based on Partition Switching put us back to the drawing board, and resulted in the creation of a reusable mechanism called Partitioning Mechanism. Coupled with a TRY / CATCH approach for error handling, this mechanism ensured all the flexibility we needed during data refresh. Full details on this topic on the article series The Name of the Game - Table Partitioning

 

4.      Still, even with Partition Mechanism on top, we've started noticing an increase in the number of database blockages occurring during the critical Partition Switch phase. These were cause by long running queries coming from the ad-hoc reporting area, which we've had little control over and limited possibility of intervening using indexing policy, since our indexing policy is designed to address standard reporting needs.

 

5.      The bright side of Partition Switching is that is a meta-data operation, which needs just milliseconds to be performed. Compare this with an explicit transaction, which encapsulates the whole DML operation, and Partition Switching has a definitive advantage here. And here comes the dark side: this whole advantage applies ONLY if the table is NOT in use when the Partition Switch needs to be performed. Because if the table IS IN USE, than Partition Switching command will be put in a waiting queue, by the database engine,  as it needs Exclusive Lock on the table definition. This also means that every new user query will be put on hold, and will in fact be blocked, by the Partition Switch operation. Which is definitely BAD?!

 

6.   More or less every other options like relying on explicit transactions (for INSERT/UPDATE) or implicit transactions (for MERGE), will reach the point when the transaction needs to be committed, and they will be blocked in a similar way by the ongoing user reporting. What is different is the severity of the locking occurring, with Partition Switching holding the most restrictive lock. But we've started noticing that sometimes making the change from Partition Switching to MERGE either alleviate or solved some of the blockages we were encountering. Definitely good news and will hear more on this later on. But this change still didn't fix ALL blockages.

 

So this is the long introduction of the problem that we needed to address. And something tells me that if you are using a ROLAP reporting solution, these problems should sound familiar. After seeing this escalation in blockages, I knew that we needed to do something different in order to permanently solve the problem. I also knew that SQL Server offers row versioning using options like Snapshot isolation level. Only the trick is that this doesn't work with Partition Switching.  Basically, every approach that was based on tracking the DML changes on the final FACT table could not be used with Partition Switching, since there were no DML changes to be traced on the final table, with all processing done in a separated TMP table. Same thing for options like Change Data Capture and Change Tracking, which are using the table log to trace the changes, and don't work with Partition Switching.

 

But more details on the implemented technical solution you can find in Custom Row Versioning - Part 2

 

And now, back to you: how much are Data Concurrency issues part of your daily life as a Business Intelligence developer? What solutions have you found to overcome these problems?

 

Enjoy,

 

Nicu

Tags:

Data Warehouse Best Practices

Add comment

biuquote
  • Comment
  • Preview
Loading

RecentPosts

Disclaimer

The ideas you'll find on this site represent the opinion of their owner. They are not either about or being endorsed by a specific company. This site is a place of sharing ideas and trying to find together best practices on Data Warehouse and Business Intelligence projects.

Some of the technical approaches will be praized, some of them will be criticized, based on the concepts presented by their owners. You shoudn't feel offended if some of the ideas which either you or your company consider best practices, here are being criticized. You should also realize that what some people consider best practices on a particular system and business scenario, can be worst practices on another.  If you don't agree with a particular idea, feel free to comment it in a civilized manner.

Only by crashing different ideas and experiences we will find a better way of doing Data Warehouse and Business Intelligence projects.