Machines deal with common timing of the task program

Business need to use regular tasks, but must take into account the timing of the task may be repeatedly initiated at the same time in the cluster, which is not what we want, so the following collected some common scenarios for your reference.


problem

First is that the regular tasks in a cluster need to be resolved:

1, if each machine in the cluster to start a scheduled task, likely to cause problems of data duplication processing

2, if the timing of the task switching mode, only one machine switch on, switch OFF other machines, to avoid duplication of data processing problems, but there is a problem of the single point of failure.


Program

① designated to perform regular tasks of the machine
to select multiple machines in a task execution timing, every execution to determine whether the current back to the machine and the specified machine are identical, will be performed.

This approach avoids the implementation of many times, but the most obvious shortcoming is the single point of failure, when this specified machine hung up after the task will not be executed.


② is achieved by dynamically loading a configuration file
, for example: New Timmer multiple profiles, the tasks assigned. Load different profiles for different machines, that is, different machines perform different tasks.

This approach defect is to have at least two different profiles, so maintenance is very troublesome.


③ timing tasks performed separately

The procedures involved in the task timing, peeled out from the main routine performed separately.

This approach defect is to increase the cost of development and maintenance, if not a large project, it is generally not recommended.


④ listener
program to monitor, monitor whether there is duplication of tasks performed by a timer, the relevant business logic any, is not executed.


⑤ timer task read from the database
since the same is connected to a database, the database to a marked task timing corresponding mark, to distinguish whether or repeat the timed task.

create table timed_task (
type int(3) not null,
status tinyint(1) not null,
exec_timestamp bigint(20) not null,
interval_time int(11) not null,
PRIMARY KEY (type)
);

This table is used to indicate whether the current query task may be performed, if there are many timing of the project tasks, a plurality of different types of timers can be distinguished by the type field, in order to achieve a common table.

type  				# 将多个不同类型的定时器进行区分
status 				# 状态,是否可以执行定时任务,0为可执行,1为不可执行
exec_timestamp		# 上一次定时任务的执行时间
interval_time		# 时间阈值,以秒为单位

Here Insert Picture Description

Here mainly talk about the field interval_time, mainly used to detect the presence or absence node to perform regular tasks failure or downtime and prevent short timed task had been performed.

You can set up a checking mechanism, when performing node failure or downtime is not timely field status is reset to 0, so that the re-timed task in an executable state, then can be compared based on field exec_timestamp and field interval_time and the current time . If the current time is larger than the field exec_timestamp and field interval_time sum, while the status field is always 1 if, indicating a fault or downtime, then you can go check program where there is a problem.

Timer task performed in the cluster, each machine may be present as a difference of a few seconds, so there may be
a timing task it has been processed through a machine, and the processing time is only a few milliseconds, so the status is reset to field 0, so that the re-timed task in an executable state, but because a few seconds of deviation, then a machine only to the regular tasks of the time expires, found in the database field status timed task between 0 and enforceable in status, it will again perform a task this timing, the situation repeated data processing occurs. Therefore, this field interval_timecan be used to prevent the problem repeatedly performed, if the current time is less than the fields and field exec_timestamp interval_time sum words and status field is 0, indicating that the timing of the short term task has been performed once, then allowed performed again.


⑥ use redis expired distributed lock mechanism and
borrow distributed lock redis to solve the problem of duplication of data processing, in order to prevent the lock is locked for a long time, or to prevent the client does not delete bounced off the lock, can be added to expire expiration time.



Individual practice

I personally use the sixth program, it is the use of distributed lock redis to solve the problem of multi-machine processing of timed tasks. There will be the following simple implementation code, but due to limited space, so only the code in the next file, but this is irrational structure in development, for reference here.

Note: The following code redis practice is for the case of a single instance, and implementing distributed lock will be mentioned in another article from the cluster architecture or redis redis main

package main
import (
	"github.com/robfig/cron/v3"
	"github.com/go-redis/redis"
	"time"
	"fmt"
	"math/rand"
	"strconv"
	"crypto/md5"
)

// 分布式锁的结构体
type redisLock struct {
	key   string
	value string
}

var Cron *cron.Cron			// 用到第三方库提供的定时任务,很方便就解决了定时任务的难题, 避免重复造轮子
var client = redis.NewClient(&redis.Options{
	Addr:     "127.0.0.1:6379",
	Password: "", // no password set
	DB:       0,          // use default DB
})

func StartCronJob() {
	Cron = cron.New()
	cronJobTime := "* * * * *"	// 每分钟执行一次

	_, err := Cron.AddFunc(cronJobTime, cronJob)
	if err != nil {
		fmt.Println(err)
		return
	}
	Cron.Start()		
}

func cronJob() {
	r := redisLock{}
	hasLock, err := r.GrabLock(time.Now().Unix(), 101)		// 抢锁
	if err != nil {
		return
	}
	if !hasLock {
		return
	}
	defer r.ReleaseLock()			// 抢到锁的才有资格释放锁
	fmt.Println("hello world, ", time.Now())
}

// 抢锁
func (r *redisLock) GrabLock(dateStamp int64, tp int) (hasLock bool, err error) {
	// 随机睡眠(0-20ms),在集群中,往往由于存在几ms的时间偏差,所以可能会导致定时任务
	// 总是被某几台机器执行,这就很不均衡了,所以随机睡眠,让每台机器都有机会抢到锁去执行定时任务
	rand.Seed(time.Now().UnixNano())
	time.Sleep(time.Duration(rand.Intn(20)) * time.Millisecond)

	// key之所以这样设置就是因为如果存在很多定时任务的时候,可以更好的区别开来不同的定时任务
	key := "cron_job:" + strconv.FormatInt(dateStamp, 10) + ":" + strconv.Itoa(tp)

	// value之所以这样设置是为了在所有获取锁请求的客户端里保持唯一,就是用来保证能安全地释放锁,这个很重要,因为这可以避免误删其他客户端得到的锁
	ds := strconv.FormatInt(dateStamp, 10) + strconv.FormatInt(int64(tp), 10) + strconv.Itoa(rand.Intn(100))
	value := fmt.Sprintf("%x", md5.Sum([]byte(ds)))
	expireTime := 300

	r.key = key
	r.value = value

	// 上锁
	hasLock, err = client.SetNX(key, value, time.Duration(expireTime)*time.Second).Result()
	if err != nil {
		fmt.Println(err)
		return
	}
	return
}

// 释放锁
func (r *redisLock) ReleaseLock() {
	// 为保证原子性,使用lua脚本去删除key
	delScript := `if redis.call("get", KEYS[1]) == ARGV[1] then
	return redis.call("del", KEYS[1])
	else return 0
	end`
	keys := []string{r.key}
	result, err := client.Eval(delScript, keys, r.value).Result()
	if err != nil || result == 0 {
		fmt.Println(err)
		return
	}
}

func main() {
	StartCronJob()
}

First, by means of the redis setnxto operate the command setnxitself for the time key assignment will determine whether the existence of this key redis, if there returns -1, if not, will directly setkey. That setnxcommand with setany difference command? setnxIt is atomic, but setcan not guarantee atomicity.

If the timeout is to prevent a lock to grab a client before redis not release the lock on is down, the lock will never be released, other clients will not grab the lock, the result is a deadlock.

So why execution eval()method ensures atomicity, due to the characteristics Redis, here is the official website of the evalcommands are interpreted: the evalcommand execution Luatime of the script, Luathe script will be treated as a command to execute, and until the evalcommand is completed, Redis will perform other command.

The most common way is to release the lock using a direct del()method to remove the lock, this is not to determine the owner of the lock to release the lock and direct way, will lead to the release of a client to another client locks. For example: A client got the lock was blocked an operation for a long time, and then the lock after a time-out after redis automatically released, and then the A client thought the lock or your own, go to releasing the lock, the lock may actually have been got client B, thus leading to the release of the client a client B lock. So with a simple DELinstruction may cause a client to delete the other clients of the lock.
Here Insert Picture Description
Therefore, in order to prevent someone else's lock released, it must be checked before releasing the lock, the current value is not set up their own into the value, if not, it means that the lock is not their own, and can not be released.



reference

http://blog.sina.com.cn/s/blog_3fba24680102vhb4.html
https://blog.csdn.net/crazy_yinchao/article/details/77837357
https://www.bbsmax.com/A/amd08lBjdg/

Published 22 original articles · won praise 6 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_42403866/article/details/96909044