The most commonly used distributed ID solutions are here!

"1. Distributed ID Concept"
Speaking of ID, the characteristic is uniqueness. In the human world, ID is the identity card, and is the unique identity of each person. In complex distributed systems, it is often necessary to uniquely identify large amounts of data and messages. For example, the ID field of the database can use self-increment as the ID in the case of a single body, but after the data is sub-database and table-divided, a unique ID must be required to identify a piece of data. This ID is a distributed ID. For distributed ID, it also needs to have the characteristics of a distributed system: high concurrency, high availability, high performance and so on.

"Two, distributed ID implementation plan"

The following table is a comparison of some common solutions:
Insert picture description here
There are two popular distributed ID solutions: "number segment mode" and "snow algorithm".

The "number segment mode" depends on the database, but it is different from the mode of increasing the primary key of the database. Assuming that 100 is a number segment of 100, 200, 300, 100 IDs can be obtained every time it is fetched, and the performance is significantly improved.

The "snow algorithm" is composed of sign bit + time stamp + work machine id + serial number, as shown in the figure:
Insert picture description here
sign bit is 0, 0 means positive number, and ID is positive number.

Needless to say, the timestamp bit is used to store the timestamp, and the unit is ms.

The id bit of the working machine is used to store the id of the machine and is usually divided into 5 area bits + 5 server identification bits.

The serial number bit is self-increasing.

How much data can the snowflake algorithm store? Time range: 2^41 / (3652460601000) = 69 years of work process range: 2^10 = 1024 Serial number range: 2^12 = 4096, which means that 4096 IDs can be generated in 1ms.
According to the logic of this algorithm, only need to implement this algorithm in Java language and encapsulate it as a tool method, then each business application can directly use the tool method to obtain the distributed ID, and only need to ensure that each business application has its own work The machine ID is sufficient, and there is no need to build an application to obtain the distributed ID separately. The following is the Twitter version of the Snowflake algorithm:

public class SnowFlake {
    
    

    /**
     * 起始的时间戳
     */
    private final static long START_STMP = 1480166465631L;

    /**
     * 每一部分占用的位数
     */
    private final static long SEQUENCE_BIT = 12; //序列号占用的位数
    private final static long MACHINE_BIT = 5;   //机器标识占用的位数
    private final static long DATACENTER_BIT = 5;//数据中心占用的位数

    /**
     * 每一部分的最大值
     */
    private final static long MAX_DATACENTER_NUM = -1L ^ (-1L << DATACENTER_BIT);
    private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
    private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);

    /**
     * 每一部分向左的位移
     */
    private final static long MACHINE_LEFT = SEQUENCE_BIT;
    private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
    private final static long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT;

    private long datacenterId;  //数据中心
    private long machineId;     //机器标识
    private long sequence = 0L; //序列号
    private long lastStmp = -1L;//上一次时间戳

    public SnowFlake(long datacenterId, long machineId) {
    
    
        if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) {
    
    
            throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0");
        }
        if (machineId > MAX_MACHINE_NUM || machineId < 0) {
    
    
            throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0");
        }
        this.datacenterId = datacenterId;
        this.machineId = machineId;
    }

    /**
     * 产生下一个ID
     *
     * @return
     */
    public synchronized long nextId() {
    
    
        long currStmp = getNewstmp();
        if (currStmp < lastStmp) {
    
    
            throw new RuntimeException("Clock moved backwards.  Refusing to generate id");
        }

        if (currStmp == lastStmp) {
    
    
            //相同毫秒内,序列号自增
            sequence = (sequence + 1) & MAX_SEQUENCE;
            //同一毫秒的序列数已经达到最大
            if (sequence == 0L) {
    
    
                currStmp = getNextMill();
            }
        } else {
    
    
            //不同毫秒内,序列号置为0
            sequence = 0L;
        }

        lastStmp = currStmp;

        return (currStmp - START_STMP) << TIMESTMP_LEFT //时间戳部分
                | datacenterId << DATACENTER_LEFT       //数据中心部分
                | machineId << MACHINE_LEFT             //机器标识部分
                | sequence;                             //序列号部分
    }

    private long getNextMill() {
    
    
        long mill = getNewstmp();
        while (mill <= lastStmp) {
    
    
            mill = getNewstmp();
        }
        return mill;
    }

    private long getNewstmp() {
    
    
        return System.currentTimeMillis();
    }

    public static void main(String[] args) {
    
    
        SnowFlake snowFlake = new SnowFlake(2, 3);

        for (int i = 0; i < (1 << 12); i++) {
    
    
            System.out.println(snowFlake.nextId());
        }

    }
}

"Three, distributed ID open source components"

3.1 How to choose open source components

Choosing an open source component first needs to see whether the software features meet the needs, including compatibility and scalability.

Secondly, we need to look at the current technical ability, and whether it can be used smoothly according to the current technical stack and technical ability of oneself or the team.

Third, it depends on the community of open source components. It mainly focuses on whether the update is frequent, whether the project is maintained by someone, when you encounter a pit, you can get in touch for help, and whether it is widely used in the industry.

3.2 Meituan Leaf

Leaf is a distributed ID generation service launched by Meituan’s basic R&D platform. Its name is taken from a sentence by German philosopher and mathematician Leibniz: “There are no two identical leaves in the world.” Leaf is highly reliable, Features such as low latency and global uniqueness. At present, it has been widely used in many departments such as Meituan Finance, Meituan Takeaway, and Meituan Liquor. For specific technical details, please refer to an article on the Meituan Technology Blog: "Leaf Meituan Distributed ID Generation Service". Currently, the Leaf project has been open sourced on Github: https://github.com/Meituan-Dianping/Leaf. The characteristics of Leaf are as follows:

Globally unique, there will never be duplicate IDs, and the overall trend of IDs is increasing.
High availability, the service is completely based on a distributed architecture, even if MySQL is down, it can tolerate a period of unavailability of the database.
High concurrency and low latency. On a CentOS 4C8G virtual machine, QPS can be called up to 5W+ remotely, and TP99 is within 1ms.
The access is simple, and it can be accessed directly through the company's RPC service or HTTP call.
3.3 Baidu UidGenerator

UidGenerator Baidu open source is a distributed high-performance unique ID generator based on Snowflake algorithm. Using a description from the official website: UidGenerator works in application projects in the form of components, supports custom workerId bits and initialization strategies, and is suitable for scenarios such as automatic restart and drift of instances in virtualized environments such as docker. In terms of implementation, UidGenerator uses the future time to solve the inherent concurrency limitations of the sequence; it uses RingBuffer to cache the generated UID, parallelizes the production and consumption of UID, and complements the CacheLine to avoid the hardware level caused by RingBuffer. "Pseudo-sharing" problem. The final single-machine QPS can reach 6 million. UidGenerator's GitHub address: https://github.com/baidu/uid-generator

3.4 Comparison of open source components

Baidu UidGenerator is in the Java language; the last submission record was two years ago, and it is basically unmaintained; only the snowflake algorithm is supported.

Meituan Leaf is also in Java language; it has been maintained for 2020 recently; it supports number segment mode and snowflake algorithm.

In summary, the comparison between the theory and the two open source components shows that Meituan Leaf is slightly better.

Do you still know what commonly used distributed ID solutions?

Guess you like

Origin blog.csdn.net/ncw8080/article/details/113854360