1) Reporting bias
Maybe the main reason everyone thinks Data warehouses are expensive is because they’ve heard about a really expensive one. While one would think that you’d hear all about the really successful data warehouse projects, the reality often is that really spectacular failures catch our attention more and therefore get more widely reported/discussed
2) The need to buy huge servers at the start
In the past, before the wide availability of crazy cheap storage and on demand computing, building a data warehouse meant you had to find physical space in your data center, and fill a rack with super expensive servers and storage arrays sized for future growth. This put a huge upfront cost on every data warehouse project- and often made it a non-starter.
3) The crazy prices of data warehouse-like modules for analytics tools
The prices that certain vendors charge for their more advanced analytics stuff is also a pretty clear influence. I’m not mentioning any names, but we all know who I’m talking about.
4) Fear Uncertainty and Doubt
The other reason that the risks, and cost overruns of data warehouses are “well known” is a small but vocal minority of consultants that provide data warehouse implementation services have a vested interest in making data warehouses as scary as possible. The stuff they publish shows up in search queries like “data warehouse risks” or “data warehouse failures” – it’s FUD, but unfortunately it can be effective.
The reality? Three things used together can make Datawarehouses affordable
- Powerful, practical tools
- Pragmatic, focused methodology
- Cheap, elastic computing
Powerful new Tools
Web analytics platforms are building better and better APIs. There are more options than ever, and the large, expensive, IT focused ETL tools from the mega-vendors are no longer the only option for moving and transforming data the way you need to.
Pragmatic focused methodology
Data warehousing has been around for twenty years, and if you apply some of the lessons learned, and the architectures and data models that have been proven to work , its not a research and development project anymore. Take small, bite sized steps. Validate value at every point. Find someone who has been there done that.
Cheap, Elastic Computing.
While the hype around cloud computing can get overdone, in the data warehouse area this can be huge. Instead of paying huge cost upfront, you pay for only the actual computing time you need, by the hour. Because data warehouses are often loaded daily, the ETL server loading is very intermittent, this is where elastic computing offers such huge savings.
Do the math. If your data warehouse loads in less than 2 hours, the ETL servers you buy will be idle more than 90% of the time. What would it cost to buy that compute power and have it sitting in a rack idle, compared to a monthly subscription that you pay only as you need it?
Conclusion? The cost of Data warehousing is changing, and we have to reconsider its role in our analytics strategy.
More and more data warehouse needs can be met with easier to use and less expensive tools. Sure, if you are Walmart, or Amazon, you still need to spend serious money on lots of sophisticated software and hardware. But for what used to be considered a good size data warehouse, more and more is possible for less and less.
We’re building a whole new framework for data warehousing, particularly suited to web analytics. If you are interested in learning more, and being involved in the next stages, contact us at firstname.lastname@example.org