Detailed explanation of the relationship between MySQL's binlog timestamp and exec_time

Detailed explanation of the relationship between binlog timestamp and exec_time.

Author: Li Xichao, DBA of Jiangsu Commercial Bank, responsible for database and middleware operation, maintenance and construction. Good at MySQL, Python, Oracle, and loves cycling, technology research and sharing.

Produced by the Aikeson open source community, original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.

This article is about 2,000 words and is expected to take 8 minutes to read.

Overview

Recently, when a system was tested, it was discovered that there was a delay in master-slave synchronization, and the cause of the delay was confirmed through binlog. After using the mysqlbinlog command to parse it, I found that the information in it was "somewhat vague but not understandable".

For example, for the following binlog snippet:

# at 449880
#240430 18:38:49 server id 345  end_log_pos 449967 CRC32 0xb3e8a02a     GTID    last_committed=13       sequence_number=14      rbr_only=yes    original_committed_timestamp=1714473533138376   immediate_commit_timestamp=1714473539246294     transaction_length=446792
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1714473533138376 (2024-04-30 18:38:53.138376 CST)
# immediate_commit_timestamp=1714473539246294 (2024-04-30 18:38:59.246294 CST)
/*!80001 SET @@session.original_commit_timestamp=1714473533138376*//*!*/;
/*!80014 SET @@session.original_server_version=80027*//*!*/;
/*!80014 SET @@session.immediate_server_version=80027*//*!*/;
SET @@SESSION.GTID_NEXT= 'c0ac4587-6046-11ee-9fa7-001c42c92a7b:44'/*!*/;
# at 449967
#240430 18:38:16 server id 345  end_log_pos 450039 CRC32 0x0c7cb74e     Query   thread_id=16    exec_time=37    error_code=0
SET TIMESTAMP=1714473496/*!*/;
BEGIN
/*!*/;
/*!*/;
# at 450039
#240430 18:38:16 server id 345  end_log_pos 450098 CRC32 0xf9a84808     Table_map: `testdb`.`tb3` mapped to number 110
# at 450098
#240430 18:38:16 server id 345  end_log_pos 458309 CRC32 0xad84e9b0     Write_rows: table id 110
...
# at 896439
#240430 18:38:46 server id 345  end_log_pos 896498 CRC32 0x5cd7cd3b     Table_map: `testdb`.`tb3` mapped to number 110
# at 896498
#240430 18:38:46 server id 345  end_log_pos 896540 CRC32 0x21b77031     Write_rows: table id 110 flags: STMT_END_F
...
### INSERT INTO `testdb`.`tb3`
### SET
###   @1=131060 /* INT meta=0 nullable=0 is_null=0 */
###   @2='c' /* VARSTRING(80) meta=80 nullable=1 is_null=0 */
# at 896540
#240430 18:38:49 server id 345  end_log_pos 896599 CRC32 0x6d6bf911     Table_map: `testdb`.`tb3` mapped to number 110
# at 896599
#240430 18:38:49 server id 345  end_log_pos 896641 CRC32 0xccd2fbb1     Write_rows: table id 110 flags: STMT_END_F
...
### INSERT INTO `testdb`.`tb3`
### SET
###   @1=131061 /* INT meta=0 nullable=0 is_null=0 */
###   @2='c' /* VARSTRING(80) meta=80 nullable=1 is_null=0 */
# at 896641
#240430 18:38:49 server id 345  end_log_pos 896672 CRC32 0xadb14b9d     Xid = 85

From the above binlog, we can know (P1):

#240430 18:38:16 执行 begin 开启了事务  (为便于表述,将时间字段名为timestamp)
#240430 18:38:16 执行了 tb3的insert 操作
#240430 18:38:46 执行了 tb3的insert 操作
#240430 18:38:49 执行了 tb3的insert 操作
#240430 18:38:49 执行了commit操作

Additionally (P2):

original_commit_timestamp=2024-04-30 18:38:53
immediate_commit_timestamp=2024-04-30 18:38:59
exec_time=37

Regarding P2 information, the following questions are asked:

  • Q1: What do the fields in P2 mean? How is it calculated?
  • Q2: What is the relationship between the fields of P2 and the timestamp seen by P1?
  • Q4: How is the timestamp in P1 obtained? Especially in a master-slave environment

To this end, through test verification and source code analysis, exec_timethe origin of common Event times and in binlog is analyzed, and the relationship between fields is summarized.

The following analysis is based on MySQL 8.0, and the fields may be different in different versions.

Master node binlog log

1. GTID Event

timestamp

For the main node: If there is no special instructions, the Event is to obtain the latest timestamp ( ) timestampat the initial position of each thread execution , and assign it to when the Event object is produced .dispatch_command()thd->start_timethd->start_timeLog_event::common_header->when

The main stack information is as follows:

|-handle_connection (./sql/conn_handler/connection_handler_per_thread.cc:302)
  |-do_command (./sql/sql_parse.cc:1343)
    |-dispatch_command (./sql/sql_parse.cc:1922)
      // 设置 thd->start_time
      |-thd->set_time()
        |-my_micro_time_to_timeval(start_utime, &start_time)
      |-dispatch_sql_command (./sql/sql_parse.cc:5135)
        |-mysql_execute_command (./sql/sql_parse.cc:3518)
          |-Sql_cmd_dml::execute (./sql/sql_select.cc:579)
          ……
                        |-Table_map_log_event the_event(this, table, table->s->table_map_id,is_transactional)
                        ……
                          |-Rows_log_event *const ev = new RowsEventT(this, table, table->s->table_map_id, )
                          ……
                  |-Xid_log_event end_evt(thd, xid)

immediate_commit_timestamp/original_commit_timestamp

immediate_commit_timestampThe timestamp obtained is the submission time, and the master node original_commit_timestampis equal to immediate_commit_timestamp.

|-error = trx_cache.flush(thd, &trx_bytes, wrote_xid)
  |-Transaction_ctx *trn_ctx = thd->get_transaction()
  |-trn_ctx->sequence_number = mysql_bin_log.m_dependency_tracker.step()
  |-if (trn_ctx->last_committed == SEQ_UNINIT): trn_ctx->last_committed = trn_ctx->sequence_number - 1
  |-if (!error): if ((error = mysql_bin_log.write_transaction(thd, this, &writer)))
    |-int64 sequence_number, last_committed
|-m_dependency_tracker.get_dependency(thd, sequence_number, last_committed)
|-thd->get_transaction()->last_committed = SEQ_UNINIT
    |-ulonglong immediate_commit_timestamp = my_micro_time()
    //|-ulonglong original_commit_timestamp = thd->variables.original_commit_timestamp
    |-ulonglong original_commit_timestamp = immediate_commit_timestamp
    |-uint32_t trx_immediate_server_version = do_server_version_int(::server_version)
    |-Gtid_log_event gtid_event(thd, cache_data->is_trx_cache(), last_committed, sequence_number,
        cache_data->may_have_sbr_stmts(), original_commit_timestamp,
        immediate_commit_timestamp, trx_original_server_version,
        trx_immediate_server_version)

2. BEGIN Event

timestamp

Note: For the master node BEGIN event, timestampit is not the timestamp when BEGIN is executed, but the first modification operation. After completing the modification of the first row of data in the InnoDB layer, the Table_map event is generated and written. Before generating the Table_map event, if the binlog cache of the entire transaction is empty at this time, the operation will be obtained immediately thd->start_timeand the real BEGIN event will be generated.

exec_time

At the same time, for the master node, exec_timeit is obtained by obtaining the latest timestamp - BEGIN Event in the process of generating BEGIN Event timestamp.

exec_time = A - B

  • A: The time when the BEGIN Event is generated after executing the first modified SQL and completing the first row modification (write/update/delete) operation.
  • B: The start execution time of the first modified SQL (thd->start_time)

The internal stack and execution sequence are as follows:

3. Table_map Event

4. Write Event

5. Xid Event

6. Summary of master node

  • In addition to BEGIN Event, timestampit is the start time of the first operation that needs to be written to the binlog (such as: write/update/delete);
  • For other Events, timestampit is the start time when the SQL statement is executed;
  • immediate_commit_timestamp/original_commit_timestampThat is the timestamp when submitted;
  • exec_time = A - B
    • A: The time when the BEGIN Event is generated after executing the first modified SQL and completing the first row modification (write/update/delete) operation.
    • B: The start execution time of the first modified SQL (thd->start_time)

Slave node binlog log

1. GTID Event

timestamp

On the slave node: For GTID Event, MySQL will not obtain the timestamp of the GTID/XID Event of the master node when parsing the event, so it will "inherit" the timestamp of the previous operation of the transaction. The timestamps of all modification operations on the slave node come from the timestamp when the master node performs the operation. Therefore, the time of the GTID/XID Event of the slave node is the timestamp of the last modification operation of the master node.

immediate_commit_timestamp/original_commit_timestamp

immediate_commit_timestampGet the timestamp of the slave node's submission time. Obtained original_commit_timestampfrom GTID Event original_commit_timestamp, that is, the main node submits the operation timestamp.

The main stack information is as follows:

|-handle_slave_worker (./sql/rpl_replica.cc:5891)
  |-slave_worker_exec_job_group (./sql/rpl_rli_pdb.cc:2549)
    |-Slave_worker::slave_worker_exec_event (./sql/rpl_rli_pdb.cc:1760)
      |-Xid_apply_log_event::do_apply_event_worker (./sql/log_event.cc:6179)
        |-Xid_log_event::do_commit (./sql/log_event.cc:6084)
          |-trans_commit (./sql/transaction.cc:246)
            |-ha_commit_trans (./sql/handler.cc:1765)
              |-MYSQL_BIN_LOG::commit (./sql/binlog.cc:8170)
                |-MYSQL_BIN_LOG::ordered_commit (./sql/binlog.cc:8789)
                  |-MYSQL_BIN_LOG::process_flush_stage_queue (./sql/binlog.cc:8326)
                    |-MYSQL_BIN_LOG::flush_thread_caches (./sql/binlog.cc:8218)
                      |-binlog_cache_mngr::flush (./sql/binlog.cc:1099)
                        |-binlog_cache_data::flush (./sql/binlog.cc:2098)
                          |-MYSQL_BIN_LOG::write_transaction (./sql/binlog.cc:1586)
                            // 生成并写入 GTID event
                            |-ulonglong immediate_commit_timestamp = my_micro_time()
                            |-if (original_commit_timestamp == UNDEFINED_COMMIT_TIMESTAMP){...}
                            |-Gtid_log_event gtid_event(thd, cache_data->is_trx_cache(), last_committed, sequence_number,
                               cache_data->may_have_sbr_stmts(), original_commit_timestamp, immediate_commit_timestamp, trx_original_server_version,
                               trx_immediate_server_version)

official

immediate_commit_timestamp - original_commit_timestamp = A + B + C

  • A = The time it takes for the master node to transfer binlog to the slave node
  • B = The time it takes to replay the binlog from the slave node
  • C = synchronization delay/interruption time

2. BEGIN Event

timestamp

Here timestampcomes from the BEGIN Event of the main node timestamp. When it is actually executed, the BEGIN Event will be obtained timestampand assigned to thd->start_time/thd->user_time. When generating an Event object from a node, just continue to thd->start_timeget the timestamp from .

exec_time

Then, the slave node exec_timestill obtains the latest timestamp in the process of generating BEGIN Event timestamp(note that timestampthe start execution time of the modified SQL from the master node) is obtained.

The main stack information is as follows:

|-handle_slave_worker (./sql/rpl_replica.cc:5891)
  |-slave_worker_exec_job_group (./sql/rpl_rli_pdb.cc:2549)
    |-Slave_worker::slave_worker_exec_event (./sql/rpl_rli_pdb.cc:1760)
      |-Log_event::do_apply_event_worker (./sql/log_event.cc:1083)
        |-Query_log_event::do_apply_event (./sql/log_event.cc:4443)
          |-Query_log_event::do_apply_event (./sql/log_event.cc:4606)
            // 设置 user_time=start_time=ev.common_header->when
            |-thd->set_time(&(common_header->when))
            // query_arg="BEGIN"
            |-thd->set_query(query_arg, q_len_arg)
            ...

official

exec_time = A + B + C + D

  • A = master node, the entire transaction time
  • B = binlog transmission time
  • C = synchronization delay/interruption time (probably - major)
  • D = Complete the first row of data modification from the node

original_commit_timestamp - timestamp = of begin event indicates the actual time consumption of the entire transaction on the master node ([Main-first modification] to [Main-commit start]).

3. Table_map Event

4. Write Event

5. Xid Event

6. From node section

  • Except for GTID/XID Event, the timestamps of other events come from the events of the master node;
  • The GTID/XID Event is timestampthe start time of the last modification operation of the master node;
  • The GTID Event original_commit_timestampcomes from the master node and immediate_commit_timestampis the latest timestamp;
  • exec_time = A - B
    • A = The latest timestamp of the BEGIN Event generated from the node
    • B = Master node starts time to execute the first DML operation

Conclusion

At this point, the timestamps and information in the binlog exec_timehave been basically sorted out. Interested friends can go back to the beginning of the article and see if there are answers to Q1-Q3.

Finally, it is recommended that readers simulate several cases in order to have a deeper understanding of the relevant fields, so that they can be more comfortable when using binlog to analyze master-slave synchronization problems.

The above information is for communication only. The author's level is limited. If there are any shortcomings, please feel free to communicate in the comment area.

For more technical articles, please visit: https://opensource.actionsky.com/

About SQLE

SQLE is a comprehensive SQL quality management platform that covers SQL auditing and management from development to production environments. It supports mainstream open source, commercial, and domestic databases, provides process automation capabilities for development and operation and maintenance, improves online efficiency, and improves data quality.

SQLE get

type address
Repository https://github.com/actiontech/sqle
document https://actiontech.github.io/sqle-docs/
release news https://github.com/actiontech/sqle/releases
Data audit plug-in development documentation https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse
How much revenue can an unknown open source project bring? Microsoft's Chinese AI team collectively packed up and went to the United States, involving hundreds of people. Huawei officially announced that Yu Chengdong's job changes were nailed to the "FFmpeg Pillar of Shame" 15 years ago, but today he has to thank us—— Tencent QQ Video avenges its past humiliation? Huazhong University of Science and Technology’s open source mirror site is officially open for external access report: Django is still the first choice for 74% of developers. Zed editor has made progress in Linux support. A former employee of a well-known open source company broke the news: After being challenged by a subordinate, the technical leader became furious and rude, and was fired and pregnant. Female employee Alibaba Cloud officially releases Tongyi Qianwen 2.5 Microsoft donates US$1 million to the Rust Foundation
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/actiontechoss/blog/11112538