flink-aligned and unaligned, exactly once and at least once

  1. How to guarantee the accuracy once? Can be set to the following 2
    1. align
      1. When a barrier is faster, the input buffer is blocked, and when another barrier arrives, it is backed up, so the data will not be repeated.
      2. Advantages: no data duplication
      3. Disadvantages: it will cause data backlog, OOM
    2. not aligned
      1. When a barrier arrives, put the barrier directly at the end, then back up the data and status of all buffers, then submit kafka, and then put the slow barrier at the end, and back up the data and status of all buffers Backup, and then submit kafka.
      2. Advantages: speed up ck
      3. Disadvantages: Due to backing up a large amount of data, it will cause high IO pressure and disk storage pressure
  2. How to guarantee at least once?
    1. align
    2. When a barrier is faster, the input buffer does not block and flows directly downstream, while the barrier waits for another barrier. When the ck backup succeeds, JM injects a new barrier, and halfway through, the backup fails. , Kafka rolls back, restores the last ck from HDFS, and restores the offset of Kafka. Since it is not blocked, it will pull duplicate data from Kafka again for calculation, which will cause data duplication, which means at least one semantic.
    3. Advantages: no blocking, no data backlog, OOM
    4. Disadvantage, it will cause data duplication

Guess you like

Origin blog.csdn.net/qq_40382400/article/details/132249054