In August our blog, “Why Proprietary Software Can be More Cost-Effective Than Commercial Open Source,” enlightened readers about the potential high costs of choosing commercial open source (COS) over proprietary software. In this blog we build on that theme by discussing the pitfalls of adopting Hadoop as a data integration solution.
In case you’re wondering, Hadoop is a project being built, used, and maintained by a global community of contributors and users. It is an open-source software framework developed for storage and large-scale processing of data sets.
In response to customers showing increasing interest in potentially using Hadoop to assist in data integration processes to support data warehousing and analytics requirements, Gartner analysts Merv Adrian and Ted Friedman defined their position in an article published last year that Hadoop is not a data integration solution.
There is a difference between a platform and a solution. While Hadoop may offer some robust data capabilities, it is not a complete, out-of-the-box data integration solution and can cost far more than the upfront cost of a commercial data integration package.
Here are some key reasons why Hadoop is not a good choice if you are looking for a complete data integration solution:
- Development time tradeoffs
Because Hadoop is not a complete solution, you’ll be investing significant project time and development resources to write custom code that enables Hadoop to perform basic data integration functions.
- Availability of Experienced Hadoop Developers
Developing custom code for Hadoop requires deep expertise in MapReduce coding, a skill set only a small number of developers possess. The skills your existing developers have invested in learning other data integration coding are not transferable for Hadoop, so there will be a steep learning curve.
- Data Reliability
Today’s complex data integration processes must be reliable, with robust monitoring, error handling, quality assessment, and administrative capabilities. Support for these capabilities in Hadoop is limited and there is no functionality for data profiling and quality. You’ll need to invest in third-party tools and custom coding to ensure your data reliability and quality.
As with other open source platforms, because Hadoop is a community project driven by the contributions of users, getting support when and where you need it is not guaranteed. Support is provided with your commercial data integration solutions, but for Hadoop you will need to rely on finding an answer within the community. Your data integration project completion time could be seriously compromised while you wait for help.
- Integration with legacy systems
Proprietary data integration solutions have developed broad connectivity capabilities to enable integration with legacy systems for data migration purposes. There is little support for integration with other tools in Hadoop, so if you need access to legacy data you’ll have to write custom code and implement complicated ETL processes, adding to your time and development costs.
As data becomes more complex, standards are becoming increasingly important. Unlike proprietary data integration solutions that provide sophisticated metadata management, Hadoop has no metadata management, which limits its ability to comply with standards.
Hadoop has strong capabilities for storing and managing vast amounts of data cheaply and efficiently, but it is a platform, not a data integration solution. If you are thinking of adopting Hadoop for your data integration needs, you should be prepared to hire developers experienced in writing Hadoop code, to invest significant amounts of time and money for these people to turn the Hadoop platform into something resembling a data integration solution, and to allow for long project implementation and completion schedules.
On the other hand, investing in a complete data integration solution like Centerprise Data Integrator will have you up and running with all the technologies and capabilities you need to meet your data integration needs quickly and easily.