Recently I've encountered some pretty nasty errors in Microsoft Analysis Services 2008, which gave me the inspiration for this article. It's about "Operation has been cancelled due to memory pressure" and sometimes its close companion “The lock operation ended unsuccessfully because of deadlock”. And they are exactly the kind of errors which really makes you think "how deep the rabbit hole goes".
Well, you'd say, what's more to dig into this? Isn't it obvious that the server doesn't have enough memory? Indeed, but before rushing to buy some more memory, and hope that everyone lived happily ever after, I'll say that's pretty useful to understand the various scenarios which may take there. Since not always buying more RAM will solve your problems.
First things first. These errors are usually the last symptoms of more complex problems occurring in the background, so you should really spend some time in identifying the actual scenarios which lead to these problem. For example, is one thing to start getting them during User Querying and a totally different thing when you are performing a dimension processing using ProcessUpdate option. And here is why:
1 Using ProcessUpdate option generates a lock on all partitions used by the cubes that include the dimension you are processing.That means that each one of these partitions are scanned in order to apply the changes occurred at dimension level. And all these partitions are being loaded into the memory. And the more partitions the cubes have, the more memory pressure they will generate during processing.
2 Processing is always being made in a transaction, which allows the changes to be rolled back, if any error occurs. This also allows the users to continue browsing the data while processing of either a dimension or partition is ongoing. However this also means that, the more dimensions you process in a single batch, more partitions are being scanned, therefore more memory is being used. And, on top, you have a huge transaction being opened, waiting to be committed. And definitely, having a huge transaction floating over your head doesn't do too much good regarding how much memory is being used
3 You could reduce the memory pressure by reducing the number of dimensions being processed in a single batch, since this will result in a smaller transaction size. However, there are several things you should be aware of:
a How many members that dimension has, the number of attributes and hierarchies, they all influence the quantity of memory being used
b How many rows are being returned by the dimension underlying SELECT. Since this result set is also loaded into memory on the Microsoft Analysis Services during processing, you are very interested to keep it as small as possible. This means that you should make sure that the rows returned contain no duplicates, by using a DISTINCT. Otherwise, the DISTINCT will probably be done, based on your dimension design, by Microsoft Analysis Services, but on a much larger result set, which takes by itself a considerable amount of memory
c Reducing the number of dimensions being processed in a single batch, can solve the memory pressure issue, but they also may generate a totally new set of problems. We're having a saying here that sounds like this: "You're running from the devil and you're running into it's father". And here is why:
i It may appear like a very good idea to create separated batches, which group different dimensions during processing. And you can launch the processing of a few of this batches in parallel, and as soon as one finishes, you'll launch the next one. In this way, you'll reduce the memory pressure by processing fewer dimensions in parallel, and by also reducing the size of the transaction. And all is fine, until you'll notice that you'll start getting some strange errors, like "The lock operation ended unsuccessfully because of deadlock", which are causing the transaction to be cancelled. There are several reasons why these errors are occurring:
1 Processing a dimension with ProcessUpdate involves acquiring a lock on the Cube and Database definition, that is hold until the transaction is being committed. When you are processing more than one dimensions in parallel in different batches, one dimension processing will acquire the lock, and in fact all other dimension processing commands from different batches, will enter a waiting queue, and wait for that lock on the cube/database definition to be released. Which means that what you've thought that was going to work in parallel, works in fact sequential. And this is one of the causes for the Timeout error that we've been talking about.
2 Processing a dimension will generate locks on all partitions defined on cubes which use this dimension. This means that, when you are processing in different batches dimensions that are used in the same cube, they are both trying to acquire a lock on the same partition. Which will lead to the same waiting queue, and the same Timeout and Repository Locks errors.
d In conclusion, if you really want to reduce successfully the memory pressure, having smaller batches with different dimensions processed in parallel is really NOT the way to go.
e Aha, you would say, then let's run these batches sequentially, which will cause neither of the above problems. Indeed, this approach will work fine, and really has the potential to reduce the memory pressure. But, running these batches sequentially does have the adverse effect of significantly increasing the processing time, and here is why:
i Each dimension process involves scanning all partitions from the cubes that use it. The partitions are loaded into memory, changes are applied, and, when processing ends and the transaction is committed, or even sooner, based on the Microsoft Analysis Services server settings, the partition is off-loaded from memory. Which is great, but you'll may have the next batch processing a dimension from the same cube, and load again the same partition and so on. So you'll end up reading from disk the same partition a lot of times, with the disk still being weakest link in the chain. And this is how you'll end with big delays in the time when the data finally becomes available to the user
4 You should always avoid having, in different batches running in parallel, both dimension and partition processing on the same database. And here are the reasons why:
a Processing dimension does involve acquiring a lock on all cube partitions on the cube is included into
b Processing a partition does involve acquiring a lock on all dimensions from the cube that partition belongs to
c This scenario generates a vicious circle, that only makes things worst and leads to our nasty friend “The lock operation ended unsuccessfully because of deadlock”
5 If you are processing either dimensions or partitions during the user working hours, than you should be aware of the additional pressure generated by doing such operations on top of the usual user reporting
a When users are making their reporting, MDX statements are run on the OLAP Database / Cubes, which generate a read lock on the objects used in the query. The more complex the query is, the longer the lock is being hold
b When you are processing an object that is used in a user query, the processing is made in a transaction, and runs in parallel with the user query. However, when the processing is finished, in order for the transaction to be committed, an exclusive lock must be acquired on the affected object.
c The more query’s are run involving the processed object, and the longer they take, they delay more and more the moment in which the transaction can be committed, and they increase the likelihood of the error “The lock operation ended unsuccessfully because of deadlock”
d Additionally, having complex user query’s running in parallel generates by itself significant memory requirements, and can lead by itself to the error “Operation has been cancelled due to memory pressure”, if not enough resources are available
6 Overall, having all dimensions being processed in a single batch, it's always the best from performance point of view, since you'll let Microsoft Analysis Services manage the parallelism, and you'll avoid the “The lock operation ended unsuccessfully because of deadlock”. Plus you have the advantage that each partition scanned during dimension processing is loaded just once from the disk, then all the changes for all dimensions included into the same batch are applied. And here we end up back where we started, and to the infamous "Operation cancelled due to memory pressure". What can we really do to prevent it?
a Buy more RAM and ensure you have a large enough Virtual Memory allocated (the rule 2xRAM is an excellent starting point). This should however be regarded more as a quick fix since, as long as you start having the infamous error, you would probably also experience some of the other problems which accompaniate it, and one of the most frequent is the increase in the time needed for dimension processing. And buying more RAM will really not help you there.
b Define a strategy for Data Archival of older partitions. If you have partitions created on a daily basis, and each partition contains, for example, the data from the beginning of the year to the current day (daily extract concept), you should really consider phasing out these older partitions to an Archive cube. The main challenge would be to implement this mechanism in a way that is totally transparent to the user. Depending on the FrontEnd application used, some customizations may be required in order for the application to choose either the Latest or the correspondent Archived database, based on the daily extract the user choosed for reporting. But the big advantage is that you will always have the same number of partitions available into the Latest cube, and therefore a controllable number of partitions being scanned during dimension processing. This approach can significantly reduce the time for dimension processing.
c Consider optimizing the SELECT statements used for dimension processing. If it's a complex Named Query, that takes a lot of time to be build, and it's also reused in different dimensions during processing, you should really consider materializing it into the underlying DataMart.
d Ensure each SELECT run during dimension processing returns distinct records, since this result set needs to be stored in Microsoft Analysis Services memory. The smaller the better.
At the end I would say that what really counts is to understand that sometimes the Architecture may be the real cause of a problems, rather the Lack of Resources. And yes, sometimes might not be easy to fix, but, in the long run, addressing the source of the disease is far more beneficial to the patient that just treating the symptoms.
And now, back to you: How often have you encountered memory problems while working with Microsoft Analysis Services? What solutions have you found in order to address these problems?