1 Go语言性能测试

写性能测试在Go语言中是很便捷的，go自带的标准工具链就有完善的支持。

1.1 benchmark

写benchmark测试有如下约定：

benchmark也是测试，因此也是以_test.go结尾的文件；
需要import testing；
测试方法以Benchmark开始，并且拥有一个*testing.B参数。

*testing.B参数提供了大多数和*testing.T类似的方法，同时也额外增加了一些性能测试相关的方法。此外，还提供了一个整型成员N，用来指定被检测操作的执行次数。

举个例子，下边是一个连接字符串的操作：

bench_test.go：

package main

import "testing"
import "strings"

func BenchmarkStringJoin1(b *testing.B) {
    input := []string{"Hello", "World"}
    for i := 0; i < b.N; i++ {
        result := strings.Join(input, " ")
        if result != "Hello World" {
            b.Error("Unexpected result: " + result)
        }
    }
}

然后执行benchmark测试：

$ go test -bench=. -run=NONE

-bench可以根据正则表达式筛选要测试的方法，这里的.表示匹配所有的测试方法；
-run=NONE表示忽略功能测试。

输出如下：

goos: linux
goarch: amd64
pkg: hello/bench_test
BenchmarkStringJoin1-8      50000000            36.6 ns/op
PASS
ok      hello/bench_test    1.871s

BenchmarkStringJoin1-8表示测试了哪个方法，8标识GOMAXPROCS的值，通常等于CPU物理线程数；
50000000即进行了50000000次测试，基准测试运行器开始并不知道这个操作耗时长短，所以开始的时候它使用一个比较小的N值，然后根据运行情况推算出合适的N值；
36.6 ns/op表示每个操作耗时36.6纳秒。

如果希望报告更多信息，可以通过增加参数实现，如通过如下命令展示内存分配信息：

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: hello/bench_test
BenchmarkStringJoin1-8      30000000            37.2 ns/op        16 B/op          1 allocs/op
PASS
ok      hello/bench_test    1.159s

16 B/op表示每次操作需要16字节的内存；
1 allocs/op表示每次操作会进行一次内存分配。

当然，也可以在测试代码中指定（这样就不需要-benchmem参数了）：

func BenchmarkStringJoin1(b *testing.B) {
    b.ReportAllocs()             // *testing.B的ReportAllocs方法指定报告内存分配信息
    input := []string{"Hello", "World"}
    for i := 0; i < b.N; i++ {
        result := strings.Join(input, " ")
        if result != "Hello World" {
            b.Error("Unexpected result: " + result)
        }
    }
}

对于以上测试方法，优化如下（增加如下方法）：

func BenchmarkStringJoin2(b *testing.B) { 
    b.ReportAllocs()
    input := []string{"Hello", "World"}
    join := func(strs []string, delim string) string { 
        if len(strs) == 2 { 
            return strs[0] + delim + strs[1];
        }
        return "";
    };
    for i := 0; i < b.N; i++ { 
        result := join(input, " ")
        if result != "Hello World" { 
            b.Error("Unexpected result: " + result)
        }
    } 
}

$ go test -bench=.
goos: linux
goarch: amd64
pkg: hello/bench_test
BenchmarkStringJoin1-8      50000000            36.5 ns/op        16 B/op          1 allocs/op
BenchmarkStringJoin2-8      100000000           20.2 ns/op         0 B/op          0 allocs/op
PASS
ok      hello/bench_test    3.909s

可见，优化后性能有较大提升，内存使用成本也有明显降低。

1.2 profiling

更多时候，我们不仅希望了解执行时长和内存占用情况，而是希望进行性能剖析，寻找关键代码优化点。Go语言使用pprof工具来进行性能剖析。

Go通过对执行过程进行采样，来获取profiling数据。具体来说支持如下几种性能指标：

CPU profiling：用于识别出执行过程中需要CPU最多的函数。在每个CPU上面执行的线程每个几毫秒进行定期的中断，并在每次中断过程中记录一个性能剖析事件，然后恢复执行。
heap profiling：用于识别出负责分配最多内存的语句。
block profiling：用于识别出阻塞协程最久的操作，比如系统调用、通道发送、接收数据和获取锁等，性能分析库会在goroutine被这些操作阻塞的时候记录一个事件。

命令如下：

$ go test -bench . -cpuprofile=cpu.out -blockprofile=block.out -memprofile=mem.out

这个时候该pprof出马了：

$ go tool pprof -text ./bench_test.test cpu.out
File: bench_test.test
Type: cpu
Time: Jul 26, 2018 at 10:17pm (CST)
Duration: 4.10s, Total samples = 3.94s (96.01%)
Showing nodes accounting for 3.85s, 97.72% of 3.94s total
Dropped 40 nodes (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
     1.63s 41.37% 41.37%      2.93s 74.37%  runtime.concatstrings
     0.55s 13.96% 55.33%      0.68s 17.26%  runtime.mallocgc
     0.34s  8.63% 63.96%      0.34s  8.63%  runtime.memmove
     0.33s  8.38% 72.34%      2.01s 51.02%  hello/bench_test.BenchmarkStringJoin2
     0.24s  6.09% 78.43%      3.17s 80.46%  runtime.concatstring3
     0.23s  5.84% 84.26%      0.96s 24.37%  runtime.rawstringtmp
     0.21s  5.33% 89.59%      1.82s 46.19%  hello/bench_test.BenchmarkStringJoin1
     0.12s  3.05% 92.64%      1.61s 40.86%  strings.Join
     0.05s  1.27% 93.91%      0.05s  1.27%  runtime.memclrNoHeapPointers
     0.05s  1.27% 95.18%      0.73s 18.53%  runtime.rawstring
     0.02s  0.51% 95.69%      0.02s  0.51%  runtime.(*mspan).nextFreeIndex
     ... ...

可以看到占用CPU时间最多的函数。

还可以输出为图片格式（不过需要先安装GraphViz，如sudo apt install graphviz）：

go tool pprof -svg ./bench_test.test cpu.out > cpu.svg

title

此外，还可以使用uber/go-torch这个库生成火焰图，这个库使用了FlameGraph。准备工作如下：

git clone --depth=1 https://github.com/brendangregg/FlameGraph.git ~/.flamegraph
export PATH=$PATH:~/.flamegraph

go get github.com/uber/go-torch

然后生成火焰图：

go-torch -b cpu.out -f cpu.torch.svg

title

2 fabric性能剖析

关于fabric性能剖析可以参考该文档：https://github.com/hyperledger-archives/fabric/wiki/Profiling-the-Hyperledger-Fabric#other-profiling

fabric性能剖析基于net/http/pprof包，使用这个包可以让pprof运行在一个web接口上。

fabric的peer内置有profile server，默认时运行在6060端口上的，并且默认关闭。可以通过将/etc/hyperledger/fabric/core.yaml中的peer.profile.enabled设置为true来启用，或者设置环境变量CORE_PEER_PROFILE_ENABLED=true。

这里我们借助caliper项目进行测试。

在caliper项目的测试环境的docker-compose.yaml文件中配置环境变量CORE_PEER_PROFILE_ENABLED=true。仍然以smallbank的测试为例，则在network/fabric/simplenetwork/docker-compose.yaml下增加一个environment变量。

然后启动性能测试，在Caliper项目根目录下执行：

node benchmark/smallbank/main.js

在环境准备好之后，测试开始之初，开启另一个终端，查看一下peer节点的IP（如docker inspect $(docker ps | grep " peer0.org1.example.com" | awk '{print $1}')命令），然后执行如下命令，收集profiling数据：

# 默认采集30秒数据
go tool pprof http://<peer-ip>:6060/debug/pprof/profile
# 定义数据采集时长
go tool pprof http://<peer-ip>:6060/debug/pprof/profile

这个命令会默认采集30秒的数据，并进入交互模式，采集的数据默认的会在~/pprof下创建名字是pprof.XXX.samples.cpu.NNN.pb.gz的文件。

在交互模式下，分别键入web指令、svg指令、top10指令等，可以生成报告或图片。

然后生成火焰图，更加直观：

go-torch -b pprof.peer.samples.cpu.001.pb.gz -f peer.cpu.torch.svg

可以发现，CPU大部分的工作都是在进行加解密和校验，如椭圆曲线算法。

title

Hyperledger fabric性能测试及分析

1 Go语言性能测试

1.1 benchmark

1.2 profiling

2 fabric性能剖析

猜你喜欢