Is Hadoop 2019 still in demand

DataWorks Summit Europe 2019: Enterprise Data Cloud as a vision

"In about six months, the joint platform from Hortonworks and Cloudera should be available for use in the public cloud," announced Wolfgang Huber, Cloudera's Senior Regional Sales Director in Central Europe, at the DataWorks Summit Europe 2019 in Barcelona. Initially, the Cloudera Data Platform (CDP) will be available on Amazon Web Services and Microsoft Azure, followed a little later by Google Cloud Platform (GCP) and the IBM Cloud. "We support the IBM Cloud primarily because Hortonworks and IBM have a close partnership," added Huber. Customers will only be able to use CDP on premises in a private cloud in the second half of the year. There should be one or two updates per month.

Cloudera and Hortonworks merge

After the merger, which was sealed at the beginning of the year, the new Cloudera Data Platform CDP merges the Cloudera Distribution of Hadoop (CDH) and the Hortonworks Data Platform HDP. CDP consists of 32 components, including the Data Science Workbench, Cloudera Data Warehouse and Cloudera DataFlow. The CDP includes analysis functions such as data flow as well as data engineering, data warehouse, operational database and machine learning. It also provides a common level for identity, orchestration, management, and operations.

Huber cites the reason for the merger, which has apparently been discussed for three and a half years, that "Cloudera and Hortonworks have a code base that is 70 percent identical. Therefore, they fit together perfectly and complement each other with the rest." The customers often asked for a supplement from the other distribution. Cloudera's unique selling point in its CDH is in the field of AI, namely in the form of the Data Science Workbench (DSW). As a portal, it supports classic IT using a Hadoop data lake on the one hand, and developers and their collaboration on the other. In 2018, Hortonworks strengthened its governance and admin skills, but also expanded the use of IoT and edge computing.

The balance sheet is also positive on the business side: "Both companies," continues Huber, "have the same target clientele, so that the merger represents a doubling of the customer base and sales capacity." Therefore, January 3, 2019, when the merger came into force, marked the start of up- and cross-selling in joint sales. Cloudera has over 2000 enterprise customers worldwide. With this clientele, the new company wants to continue to grow and gain new customers. At least for the time being, nothing should change in the existing contracts.

Companies lack data maturity

For Hadoop developers, this could mean a significant expansion of their market. Because according to a Gartner study, Hadoop-based analytics are only just beginning to mature. "According to Gartner, more than 87 percent of companies do not yet have data maturity," says Huber. Although the companies collected large amounts of data in data lakes, they needed models to train them with the data - for example, to be able to implement autonomous driving. A security framework from Cloudera called Shared Data Experience (SDX) should ensure the necessary data security.

Since both companies use open source software, the open source developer community plays a major role, explained Huber. The distributions with their numerous components have corresponding APIs that the community can use. "For us: API first. Cloudera will provide corresponding connectors for many partners," said Vikram Makhija, Vice President and General Manager Cloud at Cloudera. "We support agile development with an API for every CI / CD toolchain." In fact, developers can integrate CDP components such as Cloudera DataFlow into their environments via API.