Kudu Learning-Part 2 Installation

1. Installation method

There are currently 5 main ways to install and use Kdudu:

  1. Use Kudu Quickstart VM (Kudu Quickstart Virtual Machine)
  2. Automatic installation using Cloudera Manager on an existing cluster
  3. Manual installation using packages
  4. Build from source
  5. Use Cloudera Quickstart VM (Cloudera Quickstart VM)

To view the latest installation option information, go to Kudu's official website: http://kudu.apache.org/

Second, use Kudu Quickstart VM

Kudu Quickstart VM is the easiest and lowest cost way to learn Kudu.

The advantage of using Quickstar VM is that there is no need to have a complete cluster. In case of problems during installation, you can easily start from scratch. We can use Kudu Quickstart VM to get familiar with Kudu's API and some tools and frameworks integrated with Kudu, such as Impala. The disadvantage of this method is that Kudu runs on a virtual machine instead of a dedicated machine cluster, so it can only be used for development and drills.

Kudu's official website provides complete instructions on how to use Kudu Quickstart VM. There are two steps to install Kudu Quickstart VM:

  1. Download and run Oracle VirtualBox.
  2. Download and run the bootstrap script, which will download the image of Kudu Quickstart VM and import it into VirtualBox.

Upon completion, you will have a single node virtual machine with Kudu and Impala running on it.

Kudu Quickstart VM does not have all Hadoop tools pre-installed. If you need to test some end-to-end examples that use Spark, Spark Streaming, or Kafka, you need to manually install Spark, Kafka, and Zookeeper or migrate to other environments.

Three, use Cloudera Manager

If you want to try Kudu's powerful features and scalability, or deploy it to a production environment, you need to deploy Kudu on the cluster. The most common method is to use Cloudera Manager, and to use Cloudera's Hadoop distribution. Cloudera Manager will automatically perform pre-installation cluster verification, Kudu cluster installation, configuration, and monitoring operations.

Rather than using the traditional Linux package manager or RedHat package manager (RPM), most Cloudera users choose to install Kudu using a binary distribution called parcel. Parcel is used by Cloudera to simplify packaging and installing the various components of its release version. Since CDH 5.10, Kudu has been included in parcel, and it can be added to the cluster simply by using the "Add Service" option.

It is recommended to use Cloudera Manager and parcel to manage and install Kudu in a production environment. They are the easiest method. Since the installation steps of each Cloudera release version may be slightly different, please refer to the Cloudera documentation for details: https://www.cloudera.com/documentation.html

Cloudera Manager can configure Kudu well according to best practice guidelines, but there are still many details to consider, including hardware selection, capacity planning, host and role selection (master server and tablet server), and Kudu ’s tablet data and Log (WAL) storage location.

Fourth, use the software package

The software package supports most mainstream Linux operating systems such as RedHat, CentOS, SLES, Ubuntu or Debian Linux. Although the package-based installation is definitely more work than the automatic installation of Cloudera Manager, the installation process is not complicated. Through this step, you can better understand the different components of Kudu, and do not need the support of applications such as Cloudera Manager and VirtualBox.

4.1 Go to Kudu's official website for installation steps

Enter the installation documentation page of the corresponding release version:

https://kudu.apache.org/releases/1.6.0/docs/installation.html

Under the Install Using Packages column, two types of package installation methods are introduced: download and installation from the Repository and offline installation.

4.2 Download and install from Repository

Different types of systems use different installation tools for quick installation

  • RHEL or CentOS install with yum
  • SLES installation using zypper
  • Ubuntu or Debian install with apt-get

Cloudera need to configure the repository before performing the installation, cloudera-kudu.repodownload the official website address has been provided, but because of the domestic network access restrictions may not need to build a vpn to get the cloudera-kudu.repofile. Considering that it may be inconvenient to build a vpn on the server, Kudu can be installed using offline installation.

After configuring the repository, you can install it.

RHEL or CentOS

sudo yum install kudu                         # Base Kudu files
sudo yum install kudu-master                  # Kudu master init.d service script and default configuration
sudo yum install kudu-tserver                 # Kudu tablet server init.d service script and default configuration
sudo yum install kudu-client0                 # Kudu C++ client shared library
sudo yum install kudu-client-devel            # Kudu C++ client SDK

SLES

sudo zypper install kudu                      # Base Kudu files
sudo zypper install kudu-master               # Kudu master init.d service script and default configuration
sudo zypper install kudu-tserver              # Kudu tablet server init.d service script and default configuration
sudo zypper install kudu-client0              # Kudu C++ client shared library
sudo zypper install kudu-client-devel         # Kudu C++ client SDK

Ubuntu 或 Debian

sudo apt-get install kudu                     # Base Kudu files
sudo apt-get install kudu-master              # Service scripts for managing kudu-master
sudo apt-get install kudu-tserver             # Service scripts for managing kudu-tserver
sudo apt-get install libkuduclient0           # Kudu C++ client shared library
sudo apt-get install libkuduclient-dev        # Kudu C++ client SDK

4.3 Offline installation

4.3.1 Installation steps

The official website provides offline installation packages for various types of systems. Similarly, these download addresses from the cloudera official website need to set up a vpn to access. Taking Ubuntu as an example, the following is a list of software packages that need to be downloaded. Please ensure the consistency of the version number when downloading.

  • must
  • must master
  • must-eServer
  • libkuduclient0
  • libkuduclient-dev

Offline installation

sudo dpkg -i kudu_1.4.0+cdh5.12.2+0-1.cdh5.12.2.p0.8_xenial-kudu5.12.2_amd64.deb
sudo dpkg -i kudu-master_1.4.0+cdh5.12.2+0-1.cdh5.12.2.p0.8_xenial-kudu5.12.2_amd64.deb
sudo dpkg -i kudu-tserver_1.4.0+cdh5.12.2+0-1.cdh5.12.2.p0.8_xenial-kudu5.12.2_amd64.deb
sudo dpkg -i libkuduclient0_1.4.0+cdh5.12.2+0-1.cdh5.12.2.p0.8_xenial-kudu5.12.2_amd64.deb
sudo dpkg -i libkuduclient-dev_1.4.0+cdh5.12.2+0-1.cdh5.12.2.p0.8_xenial-kudu5.12.2_amd64.deb

4.3.2 Problem solving

kudu package installation failed due to missing components such as lsb

Install lsb using apt-get

sudo apt update
sudo apt-get install lsb

If lsb cannot be installed because it does not satisfy the dependencies, execute the command

sudo apt-get --fix-broken install

4.4 Access Kudu Web User Interface

Master:http://hostname:8051

Tablet:http://hostname:8050

Five, build from source

If you want to learn Kudu's own development, or want to flexibly select the latest and best version of Kudu upstream code, you can build and install Kudu directly from the source code.

Building from source requires more steps and solves the problems encountered during the build, and it will be more troublesome to integrate Kudu with other applications in the Hadoop ecosystem.

Six, Cloudera Quickstart VM

If you want to try Kudu in the Hadoop ecosystem, but you cannot deploy it on a real cluster, a simple and effective alternative is to run Kudu in the Cloudera Quickstart VM. This virtual machine installed the entire CDH distribution. In addition to Kudu, it also includes HDFS, Impala, Hive, Spark, etc. You can freely choose the application to be run and the application to be stopped, and you can try the integration of different components. Because it is a virtual machine and runs all services in a single environment, it requires a lot of memory and CPU resources. Therefore, its performance does not represent the performance of the real environment. Compared with Kudu Quickstart VM, Cloudera Quickstart VM needs more time to start and takes up more space. All tests can be included in a closed container.

This virtual machine supports multiple environments. It can be run in VirtualBox, VMWare, KVM or as a Docker image. The latest version can be downloaded from Cloudera's official website.

Published 40 original articles · 25 praises · 100,000+ views

Guess you like

Origin blog.csdn.net/yym373872996/article/details/105682165