Data synchronization design

Difficulty : Full synchronization and incremental synchronization have the problem of coordination of contact points.

Solution : Incremental synchronization starts first (the changed data is mirrored to Kafka but the consumer thread does not pull the data), and then full synchronization (data synchronization directly through sql query, when the query data is less than a certain A threshold is deemed necessary to end). After the full synchronization is over, the consumer thread is then asked to pull data from the message queue to start incremental synchronization. In this way, although incremental synchronization and full synchronization will overlap in the connection segment, the synchronization of data is mirrored and conforms to idempotence, so the final consistency of the data can be achieved.

The first thread pool, which manages all synchronization tasks

core size: 10
max size:100
存活时间:10ms
阻塞队列:LinkedBlockingQueue,容量为50
拒绝策略:拒绝策略为抛出异常
线程工厂:实现了ThreadFactory接口,为线程个性化命名,方便排除bug

The second thread pool, which manages data consumption threads

core size: 0
max size:50
存活时间:0
阻塞队列:LinkedBlockingQueue,容量为0,这个无所谓但是容量要为0
拒绝策略:拒绝策略为抛出异常
线程工厂:实现了ThreadFactory接口,为线程个性化命名,方便排除bug

Full synchronization

建立搜索模型,进行全量同步,记录当前同步的id,当捞出的数据量小于
一个阈值,全量同步结束,同时通知对应的搜索模型可以发给消息队列数据

Incremental synchronization

一个topic  多个partition 
模型id作为key存入partition,一个消费者对应一个partition,主动拉
取数据并进行消费。

问题:为什么不采取一个线程进行数据拉取,将数据存入共享队列中,多个线程共同消费队列中的数据方案?上述方案可能会发生消费者线程不断空拉取数据的清形,浪费性能,问如何解决?
答案:1.这种方案不好控制数据最终落库的顺序,解决方法可能较为复
杂。2. 可以模拟指数回退方法

problem:

  1. How to support horizontal expansion? How to ensure the order of messages?
    Customize the Partitioner interface and distribute messages to partitions according to the model id interval.
  2. How to dynamically delete/increase data consumption threads?
    Delete thread: thread end method, volatile + loop detection (because thread pool setting core==0)
    Add thread: throw thread into the thread pool
    In addition, you need to re-adjust the model id interval corresponding to each thread. During the adjustment, pause The process of data distribution by the consumer thread (via shared variables), waits for adjustment and resumes its process after the original data has been consumed.

How to specify the correspondence between the message and the partition

Multiple partitions for one consumer cannot guarantee the order of message consumption

How does Kafka ensure that the internal messages of the partition are in order

Kafka's message reliability and its configuration

Kafka consumes messages in a pull mode, and the advantages and disadvantages of message queue push and pull modes

Guess you like

Origin blog.csdn.net/qq_41634872/article/details/111947636