OpenTSDB配置安装

OpenTSDB配置安装

 (2012-11-29 14:11:53)

转载

标签: 

it

 

opentsdb

 

linux

 

杂谈

分类: OpenTSDB

Getting Started

This page will walk you through the setup process to get OpenTSDB running. It assumes you've read and understood the overview. With no prior experience, it should take about 15 minutes to get OpenTSDB running, including the time needed to setup HBase on a single node.

此页面将引导你完成安装程序以使得OpenTSDB运行。假设你已经阅读并理解了概述(overview),在没有任何以往经验的情况下,这将花费大约15分钟去使用OpenTSDB运行,包括在单个节点上安装HBase的时间。

Setting up OpenTSDB

搭建OpenTSDB

OpenTSDB comes pre-packaged with all the necessary dependencies except the JDK and Gnuplot.
The runtime dependencies for OpenTSDB are:

OpenTSDB预先打包好了所以必须的信赖包除了JDK和Gnuplot。运行时需要的OpenTSDB信赖包有:

Additional compile-time dependencies:

附加的编译时间的信赖包:

  • GWT 2.4 (ASLv2)

Additional unit test dependencies:

附加的单元测试信赖包:

You need to have Gnuplot (custom open-source license) installed in your PATH version 4.2 minimum, 4.4 recommended.

你需要去安装Gnuplot (传统的开源牌照) 在你的路径下,最低要求4.2版本,推荐4.4版本。

Before getting started, you need an instance of HBase 0.92 (ASLv2) up and running. If you don't already have one, you can get started quickly with a single-node HBase instance.

在开始之前,你需要一个HBase启动和运行的实例。如果你没准备一个,你能使用一个单结点的HBase实例以快速开始。

Almost all the following instructions can be copy-pasted directly into a terminal on a Linux or Mac OS X (or otherwise POSIXy) machine. You will need to edit the placeholders which are typeset like-this. A Bourne shell (such as bash or zsh) is assumed. No special privileges are required.

以下所有的工具几乎都能直接复制粘贴至一个Linux 或Mac OS X 机器的中转站。你将需要去编辑像这样排版的占位符。假定的是bourne shell(如bash或zsh),不要求什么特殊的权限。

Checkout, compile & start OpenTSDB

确认,编译和开始OpenTSDB

OpenTSDB uses the usual build process that consists in running ./bootstrap (only once, when you first check out the code), followed by ./configure and make. There is a handy shell script namedbuild.sh that will take care of all of that for you, and build OpenTSDB in a new subdirectory named build:

OpenTSDB使用常用的包括在运行中的生成程序。./bootstrap (只要一次,当你第一次确认代码时),然后,./configure 和 make。有一个便利的shell脚本:build.sh,它能为你照顾好所有,以及在一个叫build的新的子目录下生成OpenTSDB.

git clone git://github.com/OpenTSDB/opentsdb.git

cd opentsdb

./build.sh

From there on, you can use the command-line tool by invoking ./build/tsdb or you can run make install to install OpenTSDB on your system. Should you ever change your mind, there is also make uninstall, so there are no strings attached.

从那里,你可以使用命令行工具调用./build/tsdb/,或你可以运行make install在您的系统上安装OpenTSDB。如果你改变主意,也可以运行make uninstall卸载之,没有任何附加条件。

If it's the first time you run OpenTSDB with your HBase instance, you first need to create the necessary HBase tables:

如果这是你第一次用你的HBase实例去运行OpenTSDB,你首先需要去创建必须的HBase表:

env COMPRESSION=none HBASE_HOME=path/to/hbase-0.92.X ./src/create_table.sh

 

This will create two tables: tsdb and tsdb-uid. If you're just evaluating OpenTSDB, don't worry about compression for now. In production / at scale, make sure you use COMPRESSION=lzoand have LZO enabled.

这将创建两张表:tsdb和tsdb-uid。如果你仅仅是评估一下OpenTSDB,现在不用担心压缩。在规模生产时,确认你用compression=lzo以及开启lzo。

Now start a TSD (Time Series Daemon):

现在开始一个TSD(时间序列守护进程):

tsdtmp=${TMPDIR-'/tmp'}/tsd # For best performance, make sure

mkdir -p "$tsdtmp" # your temporary directory uses tmpfs

./build/tsdb tsd --port=4242 --staticroot=build/staticroot --cachedir="$tsdtmp"

If you're using a real HBase cluster, you will also need to pass the --zkquorum flag to specify the comma-separated list of hosts serving your ZooKeeper quorum. The --cachedir can be purged periodically, e.g. by a cron job.

如果你用一个真正的HBase簇,你也需要通过-zkquorum标志去指定用逗号分隔的服务ZooKeeper的主机列表。--cachedir能被周期性地清除,例如通过一个cron作业。

At this point you can access the TSD's web interface through 127.0.0.1:4242 (if it's running on your local machine).

此时你能访问TSD的网络接口通过:127.0.0.1:4242 (假设这跑在你的主机上).

Using OpenTSDB

Create your first metrics

创建你的第一个指标

Metrics need to be registered before you can start storing data points for them.

./tsdb mkmetric mysql.bytes_received mysql.bytes_sent

This will create 2 metrics: mysql.bytes_receivedand mysql.bytes_sent

在你开始为指标存储数据点之前你需要去注册它们:

./tsdb mkmetric mysql.bytes_received mysql.bytes_sent

这里将创建两个指标:mysql.bytes_receivedmysql.bytes_sent

New tags, on the other hand, are automatically registered whenever they're used for the first time. Right now OpenTSDB only allows you to have up to 224 = 16777216 different metrics, 16777216 different tag names and 16777216 different tag values. This is because each one of those is assigned a UID on 3 bytes. Metric names, tag names and tag values have their own UID spaces, which is why you can have 16777216 of each kind. The size of each space is configurable but there is no knob that exposes this configuration parameter right now. So bear in mind that using user ID or event ID as a tag value will not work right now if you have a large site.

另一方面,新的标签,每当它们在第一次使用时,都会被自动注册。现在OpenTSDB只能允许你有至多224 = 16777216个不同的指标,16777216个不同的标签名称和 16777216个不同的标签值,这是因为第一个指标、标签名称和标签值均会被分配一个3字节的UID。 指标名称、标签名称和标签值都有它们各自的UID空间,这就是为什么每一种都有16777216个不同值的原因。每个空间的大小都是可配置的,但是现在没有一个旋钮来调节这个配置参数。所以记住,如果你有一个大型的网站,你的作为标签值的用户ID或事件ID可能不能正常工作。

Start collecting data

开始收集数据

So now that we have our 2 metrics, we can start sending data do the TSD. Let's write a little shell script to collect some data off of MySQL and send it to the TSD (note: this is just an example, in practice you can use tcollector's MySQL collector.):

我们现在有了2个指标,那我们能开始发送数据去做TSD了。让我们写一个小shell脚本去从MySQL收集一些数据,然后发送至TSD(注意:这只是一个例子,实践中我们可以用tcollector的MySQL收集器)。

cat >mysql-collector.sh <<\EOF

#!/bin/bash

set -e

while true; do

mysql -u USER -pPASS --batch -N --execute "SHOW STATUS LIKE 'bytes%'" \

| awk -F"\t" -v now=`date +%s` -v host=`hostname` \

'{ print "put mysql." tolower($1) " " now " " $2 " host=" host }'

sleep 15

done | nc -w 30 host.name.of.tsd PORT

EOF

chmod +x mysql-collector.sh

nohup ./mysql-collector.sh &

Every 15 seconds, the script will collect 2 data points from MySQL and send them to the TSD. You can use a smaller sleep interval for more real-time monitoring, but remember you can't have sub-second precision, so you must sleep at least 1 second before producing another data point.

每15秒钟,脚本会从MySQL收集2个数据点,然后发送至TSD,你可以用一个更小的休眠时间间隔以获得更多的实时监控,但记住你不能使用低于秒级的精度,所以在产生另一个数据点之前必须间隔至少1秒钟。

What does the script do? If you're not a big fan of shell and awk scripting, it may not be obvious how this works. But it's simple. The set -e command simply instructs bash to exit with an error if any of the commands fail. This simplifies error handling. The script then enters an infinite loop. In this loop, we query MySQL to retrieve 2 of its status variables:

脚本在干什么呢?如果你不是一个shell和awk脚本的大粉丝,你将不会很明显地看到它们是怎么工作的。但这很简单。Set -e命令只是在任何命令失败的情况下,简单地使bash带着一个错误消息退出,这简化了错误处理。之后脚本进入一个无限循环,在这个循环里,我们查询MySQL去取出其2个状态变量:

$ mysql -u USER -pPASS --execute "SHOW STATUS LIKE 'bytes%'"

+-----------------+-------+

| Variable_name | Value |

+-----------------+-------+

| Bytes_received | 133 |

| Bytes_sent | 190 |

+-----------------+-------+

The --batch -N flags ask the mysql command to remove the human friendly fluff so we don't have to filter it out ourselves. Then the output is piped to awk, which is told to split fields on tabs (-F"\t") because with the --batchflag that's what mysqlwill use. We also create a couple of variables, one named now and initialize it to the current timestamp, the other named host and set to the hostname of the local machine. Then, for every line, we print put mysql., followed by the lower-case form of the first word, then by a space, then by the current timestamp, then by the second word (the value), another space, and finally host= and the current hostname. Rinse and repeat every 15 seconds. The -w 30 parameter given to nc simply sets a timeout on the connection to the TSD.

--batch -N标志请求Mysql命令去删除人为的友好的失误,所以我们不需要亲自去过滤之。然后用管道输出至awkawk拆分标签的字段因为在—batch标志下那正是mysql将使用的。我们也创建一对变量,一个命名为now,初始化为当前的时间戳,另一个命名为host,设置为本地机器的主机名。然后,对每一行,我们打印put mysql.,随后是小写形式的第一个字,然后是一个空格、当前时间戳、第二个字(数值)、另一个空格,最后是host=当前主机名。每15秒钟更新和重复一次。Nc里的-w 30参数简单地设置了一个连接TSD超时的时间。

Bear in mind, this is just an example, in practice you can use tcollector's MySQL collector.

If you don't have a MySQL server to monitor, you can try this instead to collect basic load metrics from your Linux servers.

记住,这只是一个例子,实践中你能使用tcollector的MySQL收集器。如果你没有一个MySQL服务器去监控,你可以试试这个,而不是从你的Linux服务器收集基本的负荷指标。

cat >loadavg-collector.sh <<\EOF

#!/bin/bash

set -e

while true; do

awk -v now=`date +%s` -v host=`hostname` \

'{ print "put proc.loadavg.1m " now " " $1 " host=" host;

print "put proc.loadavg.5m " now " " $2 " host=" host }' /proc/loadavg

sleep 15

done | nc -w 30 host.name.of.tsd PORT

EOF

chmod +x loadavg-collector.sh

nohup ./loadavg-collector.sh &

This will store a reading of the 1-minute and 5-minute load average of your server in OpenTSDB by sending simple "telnet-style commands" to the TSD:

这将存储一个读数为1分钟和5分钟的你的OpenTSDB服务器的平均负载,通过简单的“远程登录”风格的命令发送到TSD:

 

put proc.loadavg.1m 1288946927 0.36 host=foo

put proc.loadavg.5m 1288946927 0.62 host=foo

put proc.loadavg.1m 1288946942 0.43 host=foo

put proc.loadavg.5m 1288946942 0.62 host=foo

Batch imports

批量导入

Let's imagine that you have a cron job that crunches gigabytes of application logs every day or every hour to extract profiling data. For instance, you could be logging the time taken to process a request and your cron job would compute an average for every 30 second window. Maybe you're particularly interested in 2 types of requests handled by your application, so you'll compute separate averages for those requests, and an another average for every other request type. So your cron job may produce an output file that looks like this:

让我们想象一下你有一个cron作业,每天或每几个小时不断地分析千兆字节的应用程序日志中提取分析数据。例如,你可以记录处理请求所花费的时间,你的cron作业计算平均每30个第二个窗口。也许你特别感兴趣的是2种处理你的应用程序的请求类型,所以你单独计算这些请求的平均值,再计算所有其他请求类型的每一种类型的平均值。所以你的cron作业可能产生一个输出文件,它看起来像这样:

1288900000 42 foo

1288900000 51 bar

1288900000 69 other

1288900030 40 foo

1288900030 59 bar

1288900030 80 other

The first column is a timestamp, the second the average latency for that 30 second window, and the third the type of request we're talking about. If you run your cron job on a day worth of logs, you'll end up with 8640 such lines. In order to import those into OpenTSDB, you need to adjust your cron job slightly to produce its output in the following format:

第一列是一个时间戳,第二列是30个第二窗口的平均延迟,第三列正是我们在讨论的请求类型。如果你在值得记录日志的一天跑你的cron作业,你最终达到8640这样的行。为了把它们导入到OpenTSDB,你需要去稍微调整你的cron作业以产生以下这种格式的输出:

myservice.latency.avg 1288900000 42 reqtype=foo

myservice.latency.avg 1288900000 51 reqtype=bar

myservice.latency.avg 1288900000 69 reqtype=other

myservice.latency.avg 1288900030 40 reqtype=foo

myservice.latency.avg 1288900030 59 reqtype=bar

myservice.latency.avg 1288900030 80 reqtype=other

Notice we're simply associating each data point with the name of a metric (myservice.latency.avg) and naming the tag that represents the request type. If each server has its own logs and you process them separately, you may want to add another tag to each line like the host=foo tag we saw in the previous section. This way you'll be able to plot the latency of each server individually, in addition to your average latency across the board and/or per request type.

请注意,我们只是简单地用指标的名称(myservice.latency.avg) 关联每个数据点,命名代表请求类型的标签。如果每台服务器有它自己的日志并且你能分别处理它们,你可能想去添加另一条标签到每一行,就如我们在前部分看到的host=foo标签。以这样的方式你可以去单独地绘制每台服务器的延迟,以及你全部和(或)每一个请求类型的平均延迟。

In order to import a data file in the format above (metric timestamp value tags) simply run the following command:

为了在上述格式(metric timestamp value tags) 中导入一个数据文件,只需要跑下面的命令:

./tsdb import your-file

If your data file is large, consider gzip'ing it first. This can be as simple as piping the output of your cron job to gzip -9 >output.gz instead of writing directly to a file. The import command is able to read gzip'ed files and it greatly helps performance for large batch imports.

如果你的数据文件很大,先考虑压缩它,这简单得可以这么做:让你的cron作业添加一条管道命令gzip-9> output.gz,而不是直接写入到一个文件中。导入命令能够读压缩后的文件,这极大地帮助了大批量导入的工作。

Self monitoring

自我监控

Each TSD exports some stats about itself through the simple stats command. You can collect those stats and feed them back to the TSD every few seconds. First, create the necessary metrics:

每一个TSD通过简单的stats命令输出一些它自身的状态,你能收集这些状态,然后每隔几秒钟反馈给TSD,首先,创建必要的指标:

echo stats | nc -w 1 localhost 4242 \

| awk '{ print $1 }' | sort -u \

| xargs ./tsdb mkmetric

This requests the stats from the TSD (assuming it's running on the local host and listening to port 4242), extract the names of the metrics from the stats and assigns them UIDs.

这从TSD那边请求状态信息(假设它跑在本地主机上,监听4242端口),从状态信息中提取指标名称并给它们分配UID。

Then you can use this simple script to collect stats and store them in OpenTSDB:

然后你能使用简单的脚本去收集状态信息然后保存在OpenTSDB:

#!/bin/bash

INTERVAL=15

while :; do

echo stats || exit

sleep $INTERVAL

done | nc -w 30 localhost $1 \

| sed 's/^/put /' \

| nc -w 30 localhost $1

This way you will collect and store stats from the TSD every 15 seconds.

这样子每隔15秒钟你将从TSD收集状态信息然后保存。

猜你喜欢

转载自blog.csdn.net/wangshuminjava/article/details/85984441
今日推荐