Much has already been written about data warehouse appliances and their advantages. In most articles, query performance improvement is described as the primary advantage. In this article, we take a slightly different viewpoint. Data warehouse appliances can definitely improve performance, but query performance improvement is not really an advantage – it’s a property. However, this performance improvement property is the basis for certain advantages, and this articles focuses on those advantages.
Because of the lack of a generally accepted definition of the term data warehouse appliance, we present here a somewhat informal description: A data warehouse appliance is a server that runs a pre-installed database server; the combination is tuned and optimized from top to bottom to run typical data warehouse queries very efficiently.
Three Properties of an Appliance
Analyst Richard Hackathorn gave an inspiring presentation at the IBM Information On Demand conference in Las Vegas in 2008 where he talked about data warehouse appliances; see also his research report entitled Data Warehouse Appliances: Achieving the Business Value. In general he compared the properties of data warehouse appliances with those of appliances, such as toasters, refrigerators, and irons. He indicated that a common appliance has three properties: it has one function only, is easy to use, and the user should not need to understand the internal workings of the appliance. Let’s explain these properties in a little bit more detail.
The one function property means that the device is designed and developed to do one thing and one thing only. For example, a traditional toaster toasts, that’s its only function. When I was young I did try to boil some soup with ours, but that experiment was not a big success, nor did my mother appreciate my investigative mind. The same should apply for a data warehouse appliance: it’s a machine designed for one thing, namely for running complex queries, and it should do that really well and fast.
The easy to use property definitely applies to most toasters. Normally you don’t have to read a 500-page manual to install and use a toaster. It’s normally quite obvious how it works. Again, the same should apply to a data warehouse appliance: easy to install, easy to manage, easy to tune, and so on.
With respect to the internals not relevant property, most people really don’t care about how an appliance internally works, as long as it does what it’s supposed to do, although I have heard that some people did open up their toasters to see how they worked (let’s assume that’s a slight aberration). Again, the internals not relevant property applies to data warehouse appliances as well. For the developers and users, it shouldn’t matter how the appliance improves the query performance, whether it uses a parallel architecture, keeps as much data in memory as possible, uses the memory cache, partitions the tables, or all of these. It just doesn’t matter. The only important thing is that the performance is as expected. In an ideal situation, a data warehouse appliance should offer really good performance out of the box; no tuning and optimization should be needed.
Note that approximately a year later, Hackathorn did mention that there are toasters on the market that have so many functions and are so complex that you do need a manual. But we will leave that beside the discussion.
The Advantages of a Data Warehouse Appliance
This article focuses on data warehouse appliances that adhere to these three properties. Thus, if a server with a pre-installed database server requires a lot of tuning before it can be used, it will not be considered a data warehouse appliance. Note that the article does not discuss a specific product, but focuses on data warehouse appliances in general.
If this is what we mean with a data warehouse appliance, then what are the real advantages of a data warehouse appliance, or to be precise, what are the advantages if the database server currently in use is replaced by a data warehouse appliance?
More reporting and analytical capabilities: If a data warehouse appliance executes queries, it should be able to handle a bigger query workload, which means more queries, and more complex queries. Unplanned queries might even become an option. In addition, those queries that were once forbidden (too much resource consumption), can now be executed. This means that the organization can run more reports, it can do more complex data analysis, it can work with bigger datasets, and it can analyze more detailed data. In short, a data warehouse appliance offers more reporting and analytical capabilities, and that might improve the quality of the decision-making process.
Cost reductions: In this article, we assume that a data warehouse appliance requires a minimal amount of tuning and optimization of the server, of the database server, and of the database design. Out of the box, the data warehouse appliance should be able to run most queries very fast. In fact, this is what a number of organizations experienced when they migrated their data warehouses, implemented with a classic database server that they had been optimizing for years, to a data warehouse appliance. Most of them experienced a performance improvement without any tuning and optimization.
What’s the advantage? Less time (maybe no time at all) has to be spent on tuning and optimization of the server, the database server, and the database design. The value of this should not be underestimated. A classic database server requires a lot of optimization and tuning before an acceptable performance is achieved. If an organization would measure the amount of time spent by all specialists that deal with tuning and optimization, they would probably be unpleasantly surprised. And this calculation should include all the time spent on discussing where to create indexes, how to partition tables, what the values of the tablespaces parameter should be, how to set the bufferpool parameters, and so on. And don’t forget all those hours discussing whether to denormalize the tables, or change from a star to snowflake design or vice versa. It’s surprising how much time we have to devote to those issues. So, the data warehouse appliance could reduce costs considerably. And these costs can seriously outweigh the costs of buying a data warehouse appliance.
Flexibility: An additional advantage is that if less tuning and optimization is needed, it will be easier to implement new user requests. With other database servers, a new query might lead to quite a number of technical changes, such as creating and dropping indexes, repartitioning tables, changing bufferpool parameters, and so on. Sometimes this is so much work, and has so much impact on other reports, that the decision is made to not implement the new request at all. The need for these additional technical changes is less with a data warehouse appliance. It’s almost as if the data warehouse appliance is more forgiving. If a design mistake was made somewhere, the penalty would not always be that the queries become really slow. Even a weak design could still lead to an acceptable performance.
The Extra Advantages of a Data Warehouse Appliance
But is this it? Are these the big advantages of a data warehouse appliance? In my opinion it isn’t. The extra advantages of a data warehouse appliance come from somewhere else.
By using an appliance instead of a classic database server, the entire business intelligence (BI) architecture can be simplified. Many organizations work with an architecture that consists of (for example) a staging area, a central data warehouse, a set of data marts, and some additional cubes. In other words, data is stored and copied in several data stores; see Figure 1. The main reason why most data marts and cubes have been added is because of query performance. The usual assumption is that a central data warehouse alone is probably not able to handle the whole query workload all by itself. Therefore, queries are offloaded to other derived data stores (such as a data mart).
The negative consequence of such an architecture with derived data stores is that ETL jobs are needed to keep the derived data stores up to date. And that makes the whole architecture more complex. It also means it will take longer to get the data from the operational databases to the reports; the more copying, the longer it takes.
If a data warehouse appliance really delivers excellent performance, it should be possible to simplify the architecture by removing all or most of the derived data stores. The query workload would have to be redirected to the central data warehouse that has been implemented on a data warehouse appliance.
Something similar applies to a bus architecture where there is no central data warehouse but only data marts. In this case, a lot of duplicate data is stored in the dimensions of the various data marts. Here the simplification could be to store all the fact and dimension tables once in one big data warehouse, again implemented with a data warehouse appliance. If a reporting tool requires star schema structures, they could be simulated with views.
But what are the extra advantages of simplifying the business intelligence architecture?
Flexibility: Because there are less data stores and therefore less ETL jobs, it will be easier to change the reports and it will be easier to develop new ones. There is just less code to change or write. In other words, the time to market for a new report will improve. The rule for business intelligence architectures is: less is better, less is more!
Cost reductions: There are two areas where costs might go down. The first form of cost reduction comes from less time spent. Less time will be required to write ETL jobs, less time is needed to tune and optimize the derived data stores, less time is needed to tune the server and database engine, and so on. In addition, costs will go down on the hardware and software side: less licenses for database servers and ETL tools, and less hardware servers.
Operational BI: The faster data can be moved through the chain of data stores, the faster data will be available for reporting and analytical purposes. And if the speed of copying data through the chain is so fast, the data warehouse might become a real-time or near-real-time data warehouse. And that would make operational BI on a data warehouse a possibility.
BI focus: The simpler the structure of the business intelligence architecture, the less time needed for technical issues such as tuning, data copying, and optimization. But more importantly, it also makes more time available for talking to users, analyzing their needs, developing new reports, and improving the current reports. And is that not what it’s all about? The focus will shift from data warehousing to reporting and analytics.
Of course, there are disadvantages of data warehouse appliances. For example, the SQL dialect of the data warehouse appliance is probably slightly different from the one supported by the current database server, which means that some migration work has to be done. And some more disadvantages apply, but still the advantages outweigh the disadvantages.
To summarize, in most articles and presentations, improved query performance is presented as the biggest advantage of a data warehouse appliance . It’s definitely not a disadvantage, but other advantages derived from the performance improvement are probably the more important advantages.
If a database server does not conform to the three properties mentioned, meaning it’s not considered a data warehouse appliance in this article, it could still be a very useful product. It’s just that there will be less advantages.
Finally, it’s a little unfortunate that most vendors of appliances focus primarily on customers with really large warehouses. The effect is that customers with medium-sized warehouses don’t even consider a data warehouse appliance . This is unfortunate because customers with just a few terabytes of data could also benefit from appliances because they will experience the above advantages even better. So, even if your data warehouse is not 1 petabyte large, I would still recommend you to check out how data warehouse appliances can simplify your business intelligence architecture, reduce complexity, improve flexibility, and reduce costs.