Business essentials knowledge
-
Kafka’s transaction control principle
Main principle: Start transaction-->Send a ControlBatch message (transaction starts)
Commit the transaction-->Send a ControlBatch message (transaction commit)
Abandon the transaction --> send a ControlBatch message (transaction terminated)
-
Necessary configuration parameters for starting a transaction (I don’t support data rollback, but I can do it, everyone will be prosperous, and everyone will suffer)
Properties props = new Properties();
props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"doit01:9092");
props.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// acks
props.setProperty(ProducerConfig.ACKS_CONFIG,"-1");
// 生产者的重试次数
props.setProperty(ProducerConfig.RETRIES_CONFIG,"3");
// 飞行中的请求缓存最大数量
props.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"3");
// 开启幂等性
props.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,"true");
// 设置事务id
props.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"trans_001");
Code template for transaction control
// 初始化事务
producer.initTransaction( )
// 开启事务
producer.beginTransaction( )
// 干活
// 提交事务
producer.commitTransaction( )
// 异常回滚(放弃事务) catch里面
producer.abortTransaction( )
The consumer API will pull the data of uncommitted transactions; you can just choose whether to let users see it!
Whether to allow users to see the data of uncommitted transactions can be configured through consumer parameters:
isolation.level=read_uncommitted (default)
isolation.level=read_committed
-
Kafka also has an "advanced" transaction control , which is only targeted at one scenario:
The user's program must read source data from kafka, and the data processing results must be written to kafka.
Kafka can realize end-to-end transaction control (compared to the above "basic" transaction, it has one more function. The consumer's consumption offset can be bound to the transaction through the producer for submission)
producer.sendOffsetsToTransaction(offsets,consumer_id)
transaction api example
In order to implement transactions, the application must provide a unique transactional.id and enable idempotence of the producer
properties.put ("transactional.id","transactionid00001");
properties.put ("enable.idempotence",true);
The transaction methods provided in kafka producer are as follows:
Example of code structure in a typical scenario of "consume kafka-process-produce results to kafka":
package com.doit.day04;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
public class Exercise_kafka2kafka {
public static void main(String[] args) {
Properties props = new Properties();
//消费者的
props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092");
props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "shouwei");
//自动提交偏移量
props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"false");
props.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//写生产者的一些属性
props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092");
props.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//设置ack 开启幂等性必须设置的三个参数
props.setProperty(ProducerConfig.ACKS_CONFIG,"-1");
props.setProperty(ProducerConfig.RETRIES_CONFIG,"3");
props.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"3");
//开启幂等性
props.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,"true");
//开启事务
props.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"doit40");
//消费数据
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
//初始化事务
producer.initTransactions();
//订阅主题
consumer.subscribe(Arrays.asList("eventlog"));
while (true){
//拉取数据
ConsumerRecords<String, String> poll = consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
try {
//开启事务
producer.beginTransaction();
for (ConsumerRecord<String, String> record : poll) {
String value = record.value();
//将value的值写入到另外一个topic中
producer.send(new ProducerRecord<String,String>("k2k",value));
}
producer.flush();
//提交偏移量
consumer.commitAsync();
//提交事务
producer.commitTransaction();
} catch (ProducerFencedException e) {
//放弃事务
producer.abortTransaction();
}
}
}
}
Business practical cases
In actual data processing, consume-transform-produce is a common and typical scenario;
In this scenario, we often need to implement the entire process from "reading source data, to business processing, to writing the processing results to Kafka" to be atomic:
Either the whole process succeeds, or everything fails!
(The consumer offset will not be submitted until the processing and output results are successful; if the processing or output results fail, the consumption offset will not be submitted)
To achieve the above requirements, you can use the transaction mechanism in Kafka :
It allows applications to process consuming messages , producing messages , and submitting consumption displacements as atomic operations, even if the production or consumption spans multiple topic partitions;
There is a parameter isolation.level on the consumer side, which is closely related to transactions. The default value of this parameter is "read_uncommitted", which means that the consumer application can see (consume) uncommitted transactions. Of course, for submitted transactions Transactions are also visible. This parameter can also be set to "read_committed", which means that the consumer application cannot see messages in uncommitted transactions.
Control message (ControlBatch: COMMIT/ABORT) indicates whether the transaction is committed or abandoned