Having a perfectly running Data Warehouse is not an easy job. It’s a mix of hardware and software challenges to address, spiced with a lot of data verifications in order to make sure that the system is not only up and running, but it also delivers the expected level of performance and data quality.
On the hardware and software challenges, many of them are related on having the system operational no matter what. Let’s try to think at some of the scenarios a Data Warehouse has to handle on a daily basis, and why is it so hard to have it running perfectly:
1. Because the whole system is usually very complex, and integrates data from multiple sources. If one of these data sources, which are spread over a wide geographical area, has a problem, you will have a problem at Data Warehouse level. And there are a lot of things to go wrong here, from power and hardware failures, networking up to software problems and crashed billing processes. None of them are under the control of the BI Department control, but any of them will have an impact over the Data Warehouse, when they are used as active Data Sources
2. Going back to the Data Warehouse, it can also be affected by the same problems which affect the source applications. And this is why the whole Data Warehouse system must be build with redundancy in mind:
a. Implement the use of Network Aliases, for each main component of the Data Warehouse. These aliases will point to either the Main or the Secondary servers, and will allow the applications which are based on them to continue operation, even if one of them goes down. Your should define such aliases for each main component of the Data Warehouse:
· The servers which are hosting the Data Warehouse databases
· The ETL servers
· The Presentation Layer servers
All processes should take place on the Main server for each component. A Secondary server should always wait on standby, synchronized with the Main one by using features like Mirroring, Replication and Log Shipping. If something goes wrong with the Main server, and it cannot be contacted anymore by a Witness server, than the Network Alias will automatically be re-routed to the Secondary one for the affected component.
3. Hardware resources are slowly overwhelmed by increased volumes of data. You should be able both predict and to trace the evolution of such trends before they turn into a problem. And this is where the term monitoring and verifications comes into place.
4. And the list can continue....
More...