erlang节点通信

Eerlang l与节点互连,断开,监控

之前记录过 [Erlang 0005] net_kernel:monitor_nodes 订阅node连接\断开消息 , 魔鬼在于细节(Devils are in the details),这个模块还是有一些细节要注意,特别是官方文档上语焉不详的问题.先从net_kernel的几个小功能开始:

net_kernel小功能

  1. 如果erlang vm启动的时候没有指定name,使用net_kernel可以在运行时指定

Eshell V5.9 (abort with ^G)
1> net_kernel:start([test@nimbus]).
{ok,<0.34.0>}
(test@nimbus)2>

  1. 判断当前是不是longname

([email protected])3> net_kernel:longnames().
true

  1. 文档中提到 net_kernel:connect_node的返回值: returned ignored if the local node is not alive.

“local node is not alive” 说的是net_kernel没有启动的情况:
Eshell V5.9 (abort with ^G)
1> net_kernel:connect_node(a.zen.com).
ignored
2>

  1. In a distributed Erlang system, it is sometimes useful to connect to a node without also connecting to all other nodes. 要实现这个,节点启动的时候添加 -hidden [REF]

  2. 限定指定节点可以连通 net_kernel:allow(Nodes)

下面的几个问题就稍微复杂一些了,都是一些细节的问题,平时不注意可能就成了大坑.

主动断开连接

net_kernel模块导出了一个disconnect方法用来主动断开与指定节点的连接.比如可以 [net_kernel:disconnect(X) || X <- nodes() – [List_of_wanted_nodes]]. 这样批量断开一批连接.

下面的情况是a,b,c连通,然后a节点主动断开与b的连接,可以看到节点之间的关系变成了下图的关系:

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pong
([email protected])2> net_adm:ping([email protected]).
pong
([email protected])3> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])4> net_kernel:disconnect([email protected]).
true
([email protected])5> nodes().
[‘[email protected]’]
([email protected])6>

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’]
([email protected])2>

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])2> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])3>

不自动连接

只要是涉及到引用另外的节点,比如rpc:call, 当前节点就会自动连接该节点,比如下面: 

Eshell V5.9 (abort with ^G)
([email protected])1> rpc:call([email protected],erlang,now,[]).
{1356,605316,345986}
([email protected])2> nodes().
[‘[email protected]’]
([email protected])3>

如果要取消这个自动连接的行为可以使用dist_atuo_connect 参数,注意这里文档里面有个错误指定的参数不是false,是nerver.dist_auto_connect这个参数只接收两个值nerver(不自动连接) 和 once (自动连接,默认值).其它的值都是无效的. 如果想要连接到别的节点,就需要主动调用net_kernel:connect_node方法.

下面的测试,我们首先启动一个[email protected]节点,erl -setcookie abc -name [email protected] ,然后启动节点[email protected],注意启动参数中的dist_auto_connect选项:

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected] -kernel dist_auto_connect never
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> rpc:call([email protected],erlang,now,[]).
{badrpc,nodedown}
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3> rpc:call([email protected],erlang,now,[]).
{1356,605925,729498}
([email protected])4> node().
[email protected]
([email protected])5>

下面的情况是:(1) [email protected],[email protected] 两个节点启动连通,(2) 然后启动b节点添加 dist_auto_connect never选项启动. (3) [email protected] 节点主动连接节点[email protected] 看下面的结果会发现,[email protected]会收到一个异常消息: global: ‘[email protected]’ failed to connect to ‘[email protected]’ 在另外[email protected]端也会接收类似的信息. (4)注意:[email protected]在报了上面的错误之后,已经和[email protected]连通了

%% Node [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pong
([email protected])2> nodes().
[‘[email protected]’]
([email protected])3>

%% Node [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’]
([email protected])2>
=ERROR REPORT==== 27-Dec-2012::19:08:26 ===
global: ‘[email protected]’ failed to connect to ‘[email protected]
([email protected])2> nodes().
[‘[email protected]’,‘[email protected]’]

([email protected])3>

%% Node [email protected]
erl -setcookie abc -name [email protected] -kernel dist_auto_connect never

Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pang
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3>
=ERROR REPORT==== 27-Dec-2012::23:06:03 ===
global: ‘[email protected]’ failed to connect to ‘[email protected]

([email protected])3> net_adm:ping([email protected]).
pong
([email protected])4> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])5>

节点状态监控

下面让[email protected] 设置 net_kernel:monitor_nodes(true),并连接到[email protected][email protected]节点;通过flush()可以看到[email protected]接受到nodeup的消息,然后我们关闭[email protected],[email protected]接收到了nodedown消息.启动[email protected]之后,a节点重新与c的连接.

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]

Eshell V5.9 (abort with ^G)
([email protected])1> net_kernel:monitor_nodes(true).
ok
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])4> net_kernel:connect_node([email protected]).
true
([email protected])5> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])6> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])7> nodes().
[‘[email protected]’]
([email protected])8> flush().
ok
([email protected])9> net_kernel:connect_node([email protected]).
true
([email protected])10> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])11> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])12>

%%[email protected] 配合重启

erl -setcookie abc -name [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> q().
ok
([email protected])2>

erl -setcookie abc -name [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> q().
ok
([email protected])2>

%% [email protected] 节点只是打酱油的

erl -setcookie abc -name [email protected]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])2> flush().
ok
([email protected])3>

erlang:monitor_node

在ERTS里面还有一个erlang:monitor_node,这个方法用起来还是有点异常的:

节点a b c 互通, a节点执行了 erlang:monitor_node(‘[email protected]’,true). 监控c节点的状态
c节点关闭 a节点收到nodedown消息
c节点启动,a和c再次通过net_adm:ping连通
再次关闭c节点,a节点没有收到nodedown消息
启动c节点 ,在a节点重复执行 erlang:monitor_node(‘[email protected]’,true).
关闭c节点,a节点收到c的nodedown消息
目前还不清楚这是设计如此,还是bug.请教了一下霸爷他说这里这样设计正常,还是很不太明白,琢磨中.不过看下面的测试,重新连接[email protected]节点之后,该节点的最新信息已经更新到了sys_dist.

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])2> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])3> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.40.0>,undefined,
{net_address,{{127,0,0,1},39308},“zen.com”,tcp,inet},
[],normal}]
([email protected])4> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])5> nodes().
[‘[email protected]’]
([email protected])6> net_adm:ping([email protected]).
pong
([email protected])7> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])8> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.53.0>,undefined,
{net_address,{{127,0,0,1},60762},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])9> flush().
ok
([email protected])10> net_adm:ping([email protected]).
pong
([email protected])11> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.60.0>,undefined,
{net_address,{{127,0,0,1},39809},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])12> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])13> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.60.0>,undefined,
{net_address,{{127,0,0,1},39809},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])14> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])15> [Erlang 0098] net_kernel与节点互连,断开,监控

之前记录过 [Erlang 0005] net_kernel:monitor_nodes 订阅node连接\断开消息 , 魔鬼在于细节(Devils are in the details),这个模块还是有一些细节要注意,特别是官方文档上语焉不详的问题.先从net_kernel的几个小功能开始:

net_kernel小功能

  1. 如果erlang vm启动的时候没有指定name,使用net_kernel可以在运行时指定

Eshell V5.9 (abort with ^G)
1> net_kernel:start([test@nimbus]).
{ok,<0.34.0>}
(test@nimbus)2>

  1. 判断当前是不是longname

([email protected])3> net_kernel:longnames().
true

  1. 文档中提到 net_kernel:connect_node的返回值: returned ignored if the local node is not alive.

“local node is not alive” 说的是net_kernel没有启动的情况:
Eshell V5.9 (abort with ^G)
1> net_kernel:connect_node(a.zen.com).
ignored
2>

  1. In a distributed Erlang system, it is sometimes useful to connect to a node without also connecting to all other nodes. 要实现这个,节点启动的时候添加 -hidden [REF]

  2. 限定指定节点可以连通 net_kernel:allow(Nodes)

下面的几个问题就稍微复杂一些了,都是一些细节的问题,平时不注意可能就成了大坑.

主动断开连接

net_kernel模块导出了一个disconnect方法用来主动断开与指定节点的连接.比如可以 [net_kernel:disconnect(X) || X <- nodes() – [List_of_wanted_nodes]]. 这样批量断开一批连接.

下面的情况是a,b,c连通,然后a节点主动断开与b的连接,可以看到节点之间的关系变成了下图的关系:

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pong
([email protected])2> net_adm:ping([email protected]).
pong
([email protected])3> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])4> net_kernel:disconnect([email protected]).
true
([email protected])5> nodes().
[‘[email protected]’]
([email protected])6>

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’]
([email protected])2>

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])2> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])3>

不自动连接

只要是涉及到引用另外的节点,比如rpc:call, 当前节点就会自动连接该节点,比如下面: 

Eshell V5.9 (abort with ^G)
([email protected])1> rpc:call([email protected],erlang,now,[]).
{1356,605316,345986}
([email protected])2> nodes().
[‘[email protected]’]
([email protected])3>

如果要取消这个自动连接的行为可以使用dist_atuo_connect 参数,注意这里文档里面有个错误指定的参数不是false,是nerver.dist_auto_connect这个参数只接收两个值nerver(不自动连接) 和 once (自动连接,默认值).其它的值都是无效的. 如果想要连接到别的节点,就需要主动调用net_kernel:connect_node方法.

下面的测试,我们首先启动一个[email protected]节点,erl -setcookie abc -name [email protected] ,然后启动节点[email protected],注意启动参数中的dist_auto_connect选项:

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected] -kernel dist_auto_connect never
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> rpc:call([email protected],erlang,now,[]).
{badrpc,nodedown}
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3> rpc:call([email protected],erlang,now,[]).
{1356,605925,729498}
([email protected])4> node().
[email protected]
([email protected])5>

下面的情况是:(1) [email protected],[email protected] 两个节点启动连通,(2) 然后启动b节点添加 dist_auto_connect never选项启动. (3) [email protected] 节点主动连接节点[email protected] 看下面的结果会发现,[email protected]会收到一个异常消息: global: ‘[email protected]’ failed to connect to ‘[email protected]’ 在另外[email protected]端也会接收类似的信息. (4)注意:[email protected]在报了上面的错误之后,已经和[email protected]连通了

%% Node [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pong
([email protected])2> nodes().
[‘[email protected]’]
([email protected])3>

%% Node [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’]
([email protected])2>
=ERROR REPORT==== 27-Dec-2012::19:08:26 ===
global: ‘[email protected]’ failed to connect to ‘[email protected]
([email protected])2> nodes().
[‘[email protected]’,‘[email protected]’]

([email protected])3>

%% Node [email protected]
erl -setcookie abc -name [email protected] -kernel dist_auto_connect never

Eshell V5.9 (abort with ^G)
([email protected])1> net_adm:ping([email protected]).
pang
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3>
=ERROR REPORT==== 27-Dec-2012::23:06:03 ===
global: ‘[email protected]’ failed to connect to ‘[email protected]

([email protected])3> net_adm:ping([email protected]).
pong
([email protected])4> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])5>

节点状态监控

下面让[email protected] 设置 net_kernel:monitor_nodes(true),并连接到[email protected][email protected]节点;通过flush()可以看到[email protected]接受到nodeup的消息,然后我们关闭[email protected],[email protected]接收到了nodedown消息.启动[email protected]之后,a节点重新与c的连接.

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]

Eshell V5.9 (abort with ^G)
([email protected])1> net_kernel:monitor_nodes(true).
ok
([email protected])2> net_kernel:connect_node([email protected]).
true
([email protected])3> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])4> net_kernel:connect_node([email protected]).
true
([email protected])5> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])6> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])7> nodes().
[‘[email protected]’]
([email protected])8> flush().
ok
([email protected])9> net_kernel:connect_node([email protected]).
true
([email protected])10> flush().
Shell got {nodeup,‘[email protected]’}
ok
([email protected])11> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])12>

%%[email protected] 配合重启

erl -setcookie abc -name [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> q().
ok
([email protected])2>

erl -setcookie abc -name [email protected]
Eshell V5.9 (abort with ^G)
([email protected])1> q().
ok
([email protected])2>

%% [email protected] 节点只是打酱油的

erl -setcookie abc -name [email protected]

Eshell V5.9 (abort with ^G)
([email protected])1> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])2> flush().
ok
([email protected])3>

erlang:monitor_node

在ERTS里面还有一个erlang:monitor_node,这个方法用起来还是有点异常的:

节点a b c 互通, a节点执行了 erlang:monitor_node(‘[email protected]’,true). 监控c节点的状态
c节点关闭 a节点收到nodedown消息
c节点启动,a和c再次通过net_adm:ping连通
再次关闭c节点,a节点没有收到nodedown消息
启动c节点 ,在a节点重复执行 erlang:monitor_node(‘[email protected]’,true).
关闭c节点,a节点收到c的nodedown消息
目前还不清楚这是设计如此,还是bug.请教了一下霸爷他说这里这样设计正常,还是很不太明白,琢磨中.不过看下面的测试,重新连接[email protected]节点之后,该节点的最新信息已经更新到了sys_dist.

[root@nimbus ligaoren]# erl -setcookie abc -name [email protected]
Erlang R15B (erts-5.9) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9 (abort with ^G)
([email protected])1> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])2> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])3> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.40.0>,undefined,
{net_address,{{127,0,0,1},39308},“zen.com”,tcp,inet},
[],normal}]
([email protected])4> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])5> nodes().
[‘[email protected]’]
([email protected])6> net_adm:ping([email protected]).
pong
([email protected])7> nodes().
[‘[email protected]’,‘[email protected]’]
([email protected])8> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.53.0>,undefined,
{net_address,{{127,0,0,1},60762},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])9> flush().
ok
([email protected])10> net_adm:ping([email protected]).
pong
([email protected])11> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.60.0>,undefined,
{net_address,{{127,0,0,1},39809},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])12> erlang:monitor_node(‘[email protected]’,true).
true
([email protected])13> ets:tab2list(sys_dist).
[{connection,‘[email protected]’,up,<0.60.0>,undefined,
{net_address,{{127,0,0,1},39809},“zen.com”,tcp,inet},
[],normal},
{connection,‘[email protected]’,up,<0.46.0>,undefined,
{net_address,{{127,0,0,1},51575},“zen.com”,tcp,inet},
[],normal}]
([email protected])14> flush().
Shell got {nodedown,‘[email protected]’}
ok
([email protected])15>

猜你喜欢

转载自blog.csdn.net/qq_20656489/article/details/88532569