Cloudera Manager(CDH) VS Ambari(HDP)

CDH (Cloudera's Distribution, including Apache Hadoop) is one of Hadoop's many branches, maintained by Cloudera, built on a stable version of Apache Hadoop, and integrates many patches, which can be directly used in production environments.

Cloudera Manager is a component to facilitate the installation, monitoring and management of services related to Hadoop and other big data processing in the cluster. It greatly simplifies the installation and configuration management of hosts, Hadoop, Hive, Spark and other services in the cluster.

https://www.cloudera.com/

HDP (Hortonworks Data Platform) is a 100% open-source hadoop distribution launched by hortworks. It uses YARN as its architecture center and includes a large number of components such as pig, hive, phoniex, hbase, storm, and spark.

CDH has a free version (commercially available) and a paid version, and the free version is basically sufficient.

If the production environment, try to use CDH.

HDP will have inexplicable problems. Sometimes some services will fail to hang up, and sometimes ambari will fail to hang up. (Article in 2019)

Original link: https://blog.csdn.net/cloudmq/article/details/100706966


Anyone who has operated and maintained Hadoop clusters should know that the Hadoop ecosystem is a very difficult process from installation, configuration to post-operation and maintenance. Generally speaking, it may take a few days to install Hadoop, and several people are also required to operate and maintain a small cluster. The purpose of the two systems, ambari and cloudera Manager, is to simplify the installation and configuration of hadoop ecological clusters, improve the efficiency of hadoop operation and maintenance, and monitor hadoop clusters.

Ambari is a top-level project of the Apache Software Foundation. It is a web-based tool for installing, configuring, managing and monitoring Apache Hadoop clusters, supporting Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop . Ambari also provides cluster status dashboards, such as heatmaps and the ability to view MapReduce, Pig, and Hive applications, and diagnose their performance characteristics with a friendly user interface.

Cloudera Manager is a product of cloudera company, which focuses on helping you manage your own CDH clusters, and quickly and automatically configures and deploys CDH and its related components through the unified UI interface of Cloudera Manager. At the same time, Cloudera Manager also provides a variety of available Customized monitoring, diagnosis and reporting functions, unified log management functions on the cluster, unified cluster configuration management and real-time configuration change functions, multi-tenant functions, high-availability disaster recovery deployment functions and automatic recovery functions, etc., facilitate unified management and Maintain your own data center. The Cloudera Manager product is also our main installation content and introduction object. It is subdivided into the free Express version and the paid version Enterprise with full functions and many value-added services.

Original link: https://blog.csdn.net/liuxiao723846/article/details/79649506


1. What is CDH, Ambari?

Ambari is a top-level project of the Apache Software Foundation. It is a web-based tool for installing, configuring, managing, and monitoring Apache Hadoop clusters, supporting Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, and Sqoop. Ambari also provides cluster status dashboards, such as heatmaps and the ability to view MapReduce, Pig, and Hive applications, and diagnose their performance characteristics with a friendly user interface.

Cloudera Manager is a product of cloudera company, which focuses on helping you manage your own CDH clusters , and quickly and automatically configures and deploys CDH and its related components through the unified UI interface of Cloudera Manager. At the same time, Cloudera Manager also provides a variety of available Customized monitoring, diagnosis and reporting functions, unified log management functions on the cluster, unified cluster configuration management and real-time configuration change functions, multi-tenant functions, high-availability disaster recovery deployment functions and automatic recovery functions, etc., facilitate unified management and Maintain your own data center. The Cloudera Manager product is also our main installation content and introduction object. It is subdivided into the free Express version and the paid version Enterprise with full functions and many value-added services.

Introduction to CDH 

Cloudera's Distribution, including Apache Hadoop

It is one of the many branches of Hadoop, maintained by Cloudera and built on a stable version of Apache Hadoop

Provides the core of Hadoop: scalable storage, distributed computing

Web-based user interface    

Advantages of CDH  

1) The version division is clear

2) The version update speed is fast

3) Support Kerberos security authentication

4) Documentation is clear

5) Support multiple installation methods (Cloudera Manager method

2. Why are they needed?

1) For a cluster of 1000 servers, how long will it take at least to build a Hadoop cluster, including Hive, Hbase, Flume, Kafka, Spark, etc.

2) You are only given one day to complete the above tasks?

3) For the hadoop version upgrade of the above clusters, what upgrade plan would you choose, and how long will it take at least?

4) New version of Hadoop, compatible with Hive, Hbase, Flume, Kafka, Spark, etc.?

Big data cluster management methods are divided into manual (Apache hadoop) and tool (Ambari + hdp and Cloudera Manger + CDH) .

For manual deployment , too many parameters need to be configured, but it is easy to understand its principles. It is recommended for beginners to do this, and you can learn a lot. This method has to be implemented by the user, and there are too many details. When designing multiple components, the user must solve the version compatibility problem between the components by himself.

What about tool deployment , such as Ambari or Cloudera Manger. (Currently the two most mainstream cluster management tools, the former is Hortonworks, the latter is Cloudera) Using the tool can be said to be one-click operation, and the difficulty lies in the deployment of the tool Ambari or Cloudera Manger itself.

              manual method                  tool method

Difficulty Difficult, almost impossible to succeed Simple, easy

Compatibility Solve component compatibility issues by yourself Automatically install compatible components

Number of components supported Supports all components Supports commonly used components

Pros Deep for component and cluster management Simple, easy, doable

The disadvantage is too complicated, it is impossible to successfully shield too many details, which hinders the understanding of components

Tool Name Organization Open Source Community Support Ease of Use, Stability Market Share

Cloudera Manger Cloudera commercial does not support easy to use, stable high

Ambari Hortonwork open source support is easier to use, more stable and higher

Publisher:

Hortonworks has developed a big data analysis integration platform of Ambari and hdp

cloudera developed cloudera manger and cdh big data analysis integration platform

stability:

Cloudera is relatively stable

Ambari is relatively unstable (slow page opening speed)

LF:

The server-side Xmx of cloudera manager is 2G, the agent is 1G, but there are host monitor and service monitor about 1G in total.

Ambari's server-side Xmx is 2G, and metric's ams and hbase's env are about 2G.

Cluster restart:

Cloudera supports rolling restart (hdfs needs to be designed as ha to be able to roll restart)

Ambari supports rolling restart (hdfs needs to be designed as ha to be able to roll restart)

Cluster upgrade (generally speaking, do not upgrade the cluster easily):

cloudera does not support rolling upgrade service

Ambari supports rolling upgrade service (this is the advantage of ambari, hdfs must be ha)

Secondary development:

cloudera does not support

ambari support

Service version:

cloudera is older

ambari is newer

Service Integration:

cloudera is weak

Ambari is strong and supports es, redis, presto, kylin, etc.

Experience effect:

cloudera good

Ambari is relatively poor

Installation process:

cloudera complex

ambari simple

Email alert:

cloudera support is not good

ambari support is good

Installation package:

Cloudera is a parcel package

ambari is a rpm package

Summarize:

Do not upgrade component versions easily

If you have high requirements for integration and relatively weak stability, you can choose ambari

If you have high requirements for stability and relatively weak integration, you can choose cloudera

I plan to use management tools for the newly created hadoop cluster, and the following main differences are listed:

main difference Apache Ambari Cloudera Manager Express (Free Edition)
Configuration version control and history support not support
secondary development support not support
integrated support no (does not support redis, kylin, es)
maintain Rely on community strength Cloudera has done some custom development, self-maintenance or patching will get farther and farther away from the community
access control ranger (relatively simple) sentry (complex)
view customization Support for creating your own views, adding custom services not support

The new cluster synthesis needs to integrate es, kylin and other technologies, as well as support for maintenance and secondary development, so it is decided to use Ambari

Reposted from: Comparison between CDH and ambari

Guess you like

Origin blog.csdn.net/fuhanghang/article/details/132185072