Hadoop
use of hadoop
Hadoop is the poster child for Big Data, so much so that the open source data platform has become practically synonymous with the wildly popular term for storing and analyzing huge sets of information.
While Hadoop is not the only Big Data game in town, the software has had a remarkable impact. But exactly why has Hadoop been such a major force in Big Data? What makes this software so damn special – and so important?
Sometimes the reasons behind something success can be staring you right in the face. For Hadoop, the biggest motivator in the market is simple: Before Hadoop, data storage was expensive.
Hadoop, however, lets you store as much data as you want in whatever form you need, simply by adding more servers to a Hadoop cluster. Each new server (which can be commodity x86 machines with relatively small price tags) adds more storage and more processing power to the overall cluster. This makes data storage with Hadoop far less costly than prior methods of data storage.
Spendy Storage Created The Need For Hadoop
We’re not talking about data storage in terms of archiving… that’s just putting data onto tape. Companies need to store increasingly large amounts of data and be able to easily get to it for a wide variety of purposes. That kind of data storage was, in the days before Hadoop, pricey.
Hadoop, then, allows companies to store data much more cheaply. How much more cheaply? In 2012, Rainstor estimated that running a 75-node, 300TB Hadoop cluster would cost $1.05 million over three years. In 2008, Oracle sold a database with a little over half the storage (168TB) for $2.33 million – and that’s not including operating costs. Throw in the salary of an Oracle admin at around $95,000 per year, and you’re talking an operational cost of $2.62 million over three years – 2.5 times the cost, for just over half of the storage capacity.
Recent Comments