Redis underlying data structure (SDS and list)

Redis is an open source (BSD license), the data structure stored in the memory system that can be used as a database, cache and messaging middleware. Almost all of the possible line items will be used to Redis, whether you are doing the cache, or used as a messaging middleware, it is very easy to use, but it is likely that most people did not go in-depth look at some of the underlying implementation strategies Redis so the details.

Just recently encountered some Redis is also related Bug project development, due to some implementations are not familiar with the underlying, more strenuous resolved, so I plan to open such a series, recorded at study notes for some structures Redis underlying strategy.

The first part of our plan to start out of five types of data structures and objects to achieve the Redis, mainly related to the content of the following, you can also download the corresponding GitHub repository mind map given by the end of the text.

image

This article describes the intended SDS simple dynamic strings and two double-ended linked list data structure.

A, SDS simple dynamic string

Everyone knows Redis is achieved by the C language as the underlying programming language, C language character string is such a data structure, which is a character array and is a null terminated array of characters, such a configuration for Redis For too simple, so self-realized SDS Redis this simple dynamic strings structure, it is actually in the ArrayList and implementation of Java is very similar.

Under Redis sds.h source code file, there are five sdshdr, they are:

struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
复制代码

Which, sdshdr5 comments indicate, sdshdr5 IS Never Used . sdshdr5 This data structure is generally used to store less than the string length of 32 characters, but also this arrangement is no longer used, and then a small length of the string is also recommended sdshdr8 storage, because the less two key sdshdr5 field, and therefore do not have the dynamic expansion operation, once the pre-allocated memory space used up, it needs to copy the complete memory reallocation and the migration of data, the actual production environment for the performance is still very large, it was a abandoned, but in fact, some smaller keys will still be using this structure storage.

About sdshdr5 we say no more, we see each other four field structures, len field indicates the current total length of the string, the string that is currently used memory size, alloc denotes the current total memory size of the string assigned (not len and including memory allocation flags field itself), because each multi-structure will allocate memory space when some pre-allocated, mainly in order to facilitate future expansion. The lower three flags representing the current type sds high five useless. Low three values ​​are as follows:

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
复制代码

Indeed, redis memory allocation is disabled for sdshdr aligned memory, the memory address that is assigned to each field are arranged closely together, so that the parameter string is transmitted directly redis use char * pointer.

Some may doubt that only through a char how to determine the type of the current string pointer, in fact, due to the ban sdshdr memory allocation memory alignment, so sds [-1] is actually pointing to a memory address field flags, flags through the field and can get the current sds what type, and thus can read the header field to determine the relevant properties sds.

Next we talk about sdshdr with respect to the upgrading of traditional C-language strings, where performance, and which has a convenient point.

First , the traditional C string, I want to get the length of the string, at least O (n) to traverse the array again for the job and we sds need only O (1) value can take len field.

Secondly , and very importantly a design, if we initially assigned a string object, so if I want additional content behind this string of words, limited to the length of the array once initialized can not be changed, we at least need to allocate a sufficient large arrays, then a copy of the original string.

sdshdr every time a sds will allocate additional memory allocated portion of memory space is not in use, additional memory will generally equal to the memory size of the current string, if more than 1MB, then the size of additional memory space is 1MB. Whenever execution sdscat this approach, the program is adequate allocation of additional content with alloc-len relatively lower the remaining free memory, if enough memory to trigger natural weight distribution, and if the remaining unused memory space is big enough, it will direct allocation, without having to re-allocate memory.

By this pre-allocation strategy, SDS continuous increase N times the required number of memory reallocation string must be reduced from N times to N times at maximum.

Finally , for the conventional C language string that is determined by the end of the string to determine whether the current character is the null character, so I ask you to string can not contain even a null character, otherwise null characters behind the characters are not as valid characters are read. And some have special formatting requirements, the need to use the null character delimited role, so the traditional C string can not be stored, and our sds not judge by the end of the string null character, but by the value of len field for Analyzing end of the string, so that, further comprising SDS binary security this feature, i.e. the binary data which can be safely stored with special formatting requirements.

About sds we simply said that, it is a modified version of the C string, is compatible with existing C language function API, also through a number of means to enhance the performance of certain operations, it is worth learning from.

Second, the list

This linked list data structure I believe we are not unfamiliar, there are many types, such as one-way linked list, doubly linked lists, circular linked list, the list is relative to the array, one does not need contiguous block of memory addresses, and second, delete and insert the time complexity is O (1) level, very efficient, but not as an array of random access query.

The same sentence, not the best data structure, only the right data structures, such as higher-level data structures behind us to introduce, dictionary, its underlying fact dependent avoid hash collision list, we specifically Besides the back.

redis by means of C language to implement a doubly linked list structure:

typedef struct listNode {
    struct listNode *prev;
    struct listNode *next;
    void *value;
} listNode;
复制代码

Before pre pointer to a node, a pointer to the next node, value refers to the current node corresponding to the data object. I steal a diagram describing the structure of the entire concatenated list:

image

Although I can traverse the linked list node by a first head entire list, but in the direction of the package redis-layer structure, exclusively for indicating a list structure:

typedef struct list {
    listNode *head;
    listNode *tail;
    void *(*dup)(void *ptr);
    void (*free)(void *ptr);
    int (*match)(void *ptr, void *key);
    unsigned long len;
} list;
复制代码

head of the list pointed head node, tail node pointing to the tail of the linked list, the node DUP function value for a copy of the linked list to achieve the transfer copy, with equal numbers generally sufficient, but in some special cases may be used node metastasis function, a default can be assigned to this function NULL indication i.e. a node number equal to the transfer. free function to release the memory space occupied by a node, the default assignment NULL, then, that the use of redis comes zfree function to release memory space, we can also look at this zfree function.

void zfree(void *ptr) {
#ifndef HAVE_MALLOC_SIZE
    void *realptr;
    size_t oldsize;
#endif

    if (ptr == NULL) return;
#ifdef HAVE_MALLOC_SIZE
    update_zmalloc_stat_free(zmalloc_size(ptr));
    free(ptr);
#else
    realptr = (char*)ptr-PREFIX_SIZE;
    oldsize = *((size_t*)realptr);
    update_zmalloc_stat_free(oldsize+PREFIX_SIZE);
    free(realptr);
#endif
}
复制代码

Here the concept involves a aligned memory, such as on a 64-bit operating system, a fixed memory IO will remove eight bytes of data out of memory, if a variable spans two eight-byte segments, the CPU IO required to complete extraction twice the variable data, the memory alignment is introduced, in order to ensure that the memory allocation is a random variable across the above-described situation does not occur, the specific operating practices memory is filled with unnecessary position, which of course is bound cause memory fragmentation, but it is also a space for time strategy, you can disable it.

Upper half of the function is to make some judgment, if it is determined that the pointer points to a data structure total memory occupied, directly call free memory to be released, or the need for a calculation. redis zmalloc in memory data at the time of each allocated data block a header is appended PREFIX_SIZE, which is equal to the current value of the maximum address space of the system, such as 64 CPU, then, PREFIX_SIZE will take up to 8 bytes, and 8 bytes is stored in an internal memory space of the current actual data.

So here, then, the pointer PTR points to the head is moved to the lower PREFIX_SIZE first address field, and then remove the stored value inside, i.e. the current data structure of the actual memory size, the last with its own memory the incoming modified used_memory recording function update_zmalloc_stat_free value of a pointer, and release the last call free memory, including the head portion.

In fact, we pull away and continue to look at the data structure, where if not quite understand, it does not matter, we will continue to talk back.

still match function is to implement a multi-state, only gives the definition, be embodied to you, you can choose not to implement, its value is used to compare two values ​​are equal list node. Returns 0 are not equal, return is equal to 1 indicates.

Finally, a len field describes the number of nodes included in the entire list. The above is a basic definition redis in the list, add list, abstract view of the structure of the final list presented in redis in something like this, still Pirates of the map:

image

In summary, we introduced a basic list redis in achievement, to sum up, it is a double-ended list, which is to find the time complexity of the front and rear node of a node in O (1), and is also an acyclic having a head and tail of the linked list node pointer, the initial addition, but also three polymorphic function for copying, comparison between the nodes and the memory release is required to achieve the users themselves.


Public concern is not lost, love to share a programmer.
No reply to the public "1024" author plus micro-channel to explore learning!
Each article codes used in all cases the material will be uploaded my personal github
github.com/SingleYam/o…
Welcome to tread!

Guess you like

Origin juejin.im/post/5d7dac02518825297023fb35