The difference between Cloudera's CDH and Apache's Hadoop

At present, there are three main Hadoop versions (all from foreign manufacturers) that are not charged for CDH), Hortonworks version (Hortonworks Data Platform, referred to as "HDP"), for domestic, the vast majority choose CDH version, the main differences between CDH and Apache versions are as follows:

(1) CDH's division of Hadoop versions is very clear. There are only two series of versions, namely cdh3 and cdh4, which correspond to the first generation of Hadoop (Hadoop 1.0) and the second generation of Hadoop (Hadoop 2.0). In comparison, The Apache version is much more confusing; it has enhanced compatibility, security, and stability than Apache hadoop.

(2) The CDH3 version is improved based on Apache hadoop 0.20.2 and incorporates the latest patch. The CDH4 version is improved based on Apache hadoop 2.X. CDH always applies the latest bug fixes or Feature patches, and is more Apache hadoop is released earlier with the same function version, and the update speed is faster than the official Apache.

(3) Secure CDH supports Kerberos security authentication, while apache hadoop uses simple username matching authentication

(4) The CDH documentation is clear, and many users who use the Apache version will read the documentation provided by CDH, including installation documentation, upgrade documentation, etc.

(5) CDH supports Yum/Apt package, Tar package, RPM package, Cloudera Manager four ways to install, Apache hadoop only supports Tar package installation.

Note: When CDH is installed using the recommended Yum/Apt package, it has the following benefits:

1. Network installation and upgrade, very convenient

2. Automatically download dependent packages

3. The Hadoop ecosystem package is automatically matched. You do not need to find Hbase , Flume, Hive and other software that match the current Hadoop. Yum/Apt will automatically find the matching version of the software package according to the currently installed Hadoop version and ensure compatibility.

4. Automatically create relevant directories and soft-link them to appropriate places (such as conf and logs); automatically create hdfs and mapred users. The hdfs user is the highest privileged user of HDFS, and the mapred user is responsible for the permissions of the relevant directories during the execution of mapreduce.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326266845&siteId=291194637