Mesos client re-registration causes container state to be staged

1. Problem description

       1. On marathon, there are five containers whose status is staged, and these five containers are all on the same host

       2. You can see that the host is re-registered on the mesos-master, so let's see why mesos-salve has re-registered at this point in time.

                

     3. The default log is in /var/log/messages

        Due to the fact that the slave node network is disconnected from the master node, the slave node re-registers with the master after the network is restored. However, in mesos 1.3, the mesos-salve cannot connect to the mesos-master, and it will think that the marathon of the mesos framework is abnormal. The task of marathon retry is directly ignored, and this logic is also a bit drunk. (I actually didn't want to understand here, is there a great god to give pointers?) Just restart the mesos-salve node.

Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.839169 15410 slave.cpp:4826] No pings from master received within 75secs
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840070 15412 slave.cpp:913] Re-detecting master
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840323 15412 slave.cpp:959] Detecting new master
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840324 15415 status_update_manager.cpp:177] Pausing sending status updates
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840929 15414 status_update_manager.cpp:177] Pausing sending status updates
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.840983 15402 slave.cpp:924] New master detected at [email protected]:5050
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.841060 15402 slave.cpp:948] No credentials provided. Attempting to register without authentication
                        Apr 18 22:21:55 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:55.841145 15402 slave.cpp:959] Detecting new master
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.460063 15413 slave.cpp:1235] Re-registered with master [email protected]:5050
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.460983 15411 slave.cpp:3088] Shutting down framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: W0418 22:21:56.462837 15411 slave.cpp:3230] Ignoring info update for framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000 because it is terminating
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: I0418 22:21:56.542073 15407 slave.cpp:1619] Got assigned task 'hps1000000020_5-1000000020_commservice-0.d95b7825-4313-11e8-85c9-024214286ecf' for framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000
                        Apr 18 22:21:56 JQ-PZ-SER mesos-slave[15371]: W0418 22:21:56.543298 15407 slave.cpp:1793] Ignoring running task 'hps1000000020_5-1000000020_commservice-0.d95b7825-4313-11e8-85c9-024214286ecf' of framework 967c09ae-90ce-4dd1-ba80-2ad2da9fd545-0000 because the framework is terminating

    4. According to the description, first try a bug in the mesos 1.3 version

       https://issues.apache.org/jira/browse/MESOS-7215

    

    5. Restart mesos-salve or upgrade mesos

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325348098&siteId=291194637