Redis 6.0 源码阅读笔记(4)- String 数据类型源码分析

1. 存储的结构

redis 字符串对象 String 的介绍中,我们知道 redis 对于字符串的存储共有 3 种存储形式,其存储的内存结构如以下图片示例:

  • OBJ_ENCODING_INT: 保存的字符串长度小于 20,并且是可以解析为 long 类型的整数值,那么存储方式就是直接将 redisObject 的 ptr 指针指向这个整数值
    在这里插入图片描述

  • OBJ_ENCODING_EMBSTR: 长度小于 44 (OBJ_ENCODING_EMBSTR_SIZE_LIMIT)的字符串将以简单动态字符串(SDS) 的形式存储在 redisObject 中,但是redisObject 对象头会和 SDS 对象连续存在一起
    在这里插入图片描述

  • OBJ_ENCODING_RAW: 字符串以简单动态字符串(SDS) 的形式存储,redisObject 对象头和 SDS 对象在内存地址上一般是不连续的两块内存
    在这里插入图片描述

2. 数据存储源码分析

2.1 数据存储过程

  1. Redis 6.0 源码阅读笔记(1)-Redis 服务端启动及命令执行 中我们已经知道客户端保存字符串的 set 命令将会调用到 t_string.c#setCommand() 函数,其源码实现如下:

    该方法中有以下两个重点函数被调用,本节主要关注 tryObjectEncoding() 函数

    1. tryObjectEncoding() 将客户端传输过来的需保存的字符串对象尝试进行编码,以节约内存
    2. setGenericCommand() 将 key-value 保存到数据库中
    void setCommand(client *c) {
          
          
    
     ......
    
     c->argv[2] = tryObjectEncoding(c->argv[2]);
     setGenericCommand(c,flags,c->argv[1],c->argv[2],expire,unit,NULL,NULL);
    }
    
  2. object.c#tryObjectEncoding() 函数逻辑很清晰,可以看到主要进行了以下几个操作:

    1. 当字符串长度小于 20 并且可以被解析为 long 类型数据时,这个数据将以整数形式保存,并以 robj->ptr = (void*) value 这种直接赋值的形式存储
    2. 当字符串长度小于等于 OBJ_ENCODING_EMBSTR_SIZE_LIMIT 配置并且还是 raw 编码时,调用 createEmbeddedStringObject() 函数将其转化为 embstr 编码
    3. 这个字符串对象已经不能进行转码了,只好调用 trimStringObjectIfNeeded() 函数尝试从字符串对象中移除所有空余空间
    robj *tryObjectEncoding(robj *o) {
          
          
     long value;
     sds s = o->ptr;
     size_t len;
    
     /* Make sure this is a string object, the only type we encode
      * in this function. Other types use encoded memory efficient
      * representations but are handled by the commands implementing
      * the type. */
     serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);
    
     /* We try some specialized encoding only for objects that are
      * RAW or EMBSTR encoded, in other words objects that are still
      * in represented by an actually array of chars. */
     if (!sdsEncodedObject(o)) return o;
    
     /* It's not safe to encode shared objects: shared objects can be shared
      * everywhere in the "object space" of Redis and may end in places where
      * they are not handled. We handle them only as values in the keyspace. */
      if (o->refcount > 1) return o;
    
     /* Check if we can represent this string as a long integer.
      * Note that we are sure that a string larger than 20 chars is not
      * representable as a 32 nor 64 bit integer. */
     len = sdslen(s);
     if (len <= 20 && string2l(s,len,&value)) {
          
          
         /* This object is encodable as a long. Try to use a shared object.
          * Note that we avoid using shared integers when maxmemory is used
          * because every object needs to have a private LRU field for the LRU
          * algorithm to work well. */
         if ((server.maxmemory == 0 ||
             !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
             value >= 0 &&
             value < OBJ_SHARED_INTEGERS)
         {
          
          
             decrRefCount(o);
             incrRefCount(shared.integers[value]);
             return shared.integers[value];
         } else {
          
          
             if (o->encoding == OBJ_ENCODING_RAW) {
          
          
                 sdsfree(o->ptr);
                 o->encoding = OBJ_ENCODING_INT;
                 o->ptr = (void*) value;
                 return o;
             } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
          
          
                 decrRefCount(o);
                 return createStringObjectFromLongLongForValue(value);
             }
         }
     }
    
     /* If the string is small and is still RAW encoded,
      * try the EMBSTR encoding which is more efficient.
      * In this representation the object and the SDS string are allocated
      * in the same chunk of memory to save space and cache misses. */
     if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
          
          
         robj *emb;
    
         if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
         emb = createEmbeddedStringObject(s,sdslen(s));
         decrRefCount(o);
         return emb;
     }
    
     /* We can't encode the object...
      *
      * Do the last try, and at least optimize the SDS string inside
      * the string object to require little space, in case there
      * is more than 10% of free space at the end of the SDS string.
      *
      * We do that only for relatively large strings as this branch
      * is only entered if the length of the string is greater than
      * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
     trimStringObjectIfNeeded(o);
    
     /* Return the original object. */
     return o;
    }
    
  3. object.c#createEmbeddedStringObject() 函数实现 embstr 编码也很简单,主要步骤如下:

    1. 首先调用 zmalloc() 函数申请内存,可以看到此处不仅申请了需要存储的字符串的内存及 redisObject 的内存,还申请了 SDS 实现结构体之一 sdshdr8 的内存,这也就是上文所说embstr 编码只申请一次内存,并且redisObject 对象头会和 SDS 对象连续存在一起的由来
    2. 将 redisObject 对象的 ptr 指针指向 sdshdr8 开始的内存地址
    3. 填充 sdshdr8 对象各个属性,包括 len 字符串长度,alloc 字符数组容量,实际存储字符串的 buf 字符数组
    robj *createEmbeddedStringObject(const char *ptr, size_t len) {
          
          
     robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
     struct sdshdr8 *sh = (void*)(o+1);
    
     o->type = OBJ_STRING;
     o->encoding = OBJ_ENCODING_EMBSTR;
     o->ptr = sh+1;
     o->refcount = 1;
     if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
          
          
         o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
     } else {
          
          
         o->lru = LRU_CLOCK();
     }
    
     sh->len = len;
     sh->alloc = len;
     sh->flags = SDS_TYPE_8;
     if (ptr == SDS_NOINIT)
         sh->buf[len] = '\0';
     else if (ptr) {
          
          
         memcpy(sh->buf,ptr,len);
         sh->buf[len] = '\0';
     } else {
          
          
         memset(sh->buf,0,len+1);
     }
     return o;
    }
    
  4. raw 编码字符串的创建可参考 object.c#createRawStringObject() 函数,其涉及到两次内存申请,sds.c#sdsnewlen() 申请内存创建 SDS 对象,object.c#createObject() 申请内存创建 redisObject 对象

    robj *createRawStringObject(const char *ptr, size_t len) {
          
          
     return createObject(OBJ_STRING, sdsnewlen(ptr,len));
    }
    
  5. 从检测容量大小的的函数t_string.c#checkStringLength()看,字符串最大长度为 512M,超出该数值将报错

    static int checkStringLength(client *c, long long size) {
          
          
     if (size > 512*1024*1024) {
          
          
         addReplyError(c,"string exceeds maximum allowed size (512MB)");
         return C_ERR;
     }
     return C_OK;
    }
    

2.2 简单动态字符串 SDS

2.2.1 SDS 结构体

SDS(简单动态字符串) 在 Redis 中是实现字符串存储的工具,本质上依然是字符数组,但它不像C语言字符串那样以‘\0’来标识字符串结束

传统C字符串符合ASCII编码,这种编码的操作的特点就是:遇零则止 。即当读一个字符串时,只要遇到’\0’就认为到达末尾,忽略’\0’以后的所有字符。另外其获得字符串长度的做法是遍历字符串,遇零则止,时间复杂度为O(n),比较低效

SDS 的实现结构定义在 sds.h 中,其定义如下。因为 SDS 判断是否到达字符串末尾的依据是表头的 len 属性,所以能高效计算字符串长度并快速追加数据

sds 结构一共有 5 种 Header 定义,目的是为不同长度的字符串提供不同大小的 Header,以节省内存。以 sdshdr8 为例,其 len 属性为 uint8_t 类型,占用内存大小为 1 字节,则存储的字符串最大长度为256。Header 主要包含以下几个属性:

  1. len: 字符串真正的长度,不包含空终止字符
  2. alloc: 除去表头和终止符的 buf 数组长度,也就是最大容量
  3. flags: 标志 header 的类型
  4. buf: 字符数组,实际存储字符
/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    
    
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    
    
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    
    
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    
    
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    
    
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

2.2.2 SDS 容量调整
  1. SDS 扩容的函数是sds.c#sdsMakeRoomFor()当字符串长度小于 1M 时,扩容都是加倍现有的空间,如果超过 1M,扩容时一次只会多扩 1M 的空间。以下为源码实现:

    字符串在长度小于 SDS_MAX_PREALLOC(1024*1024,也就是1MB,定义在 sds.h) 之前,采用 2 倍扩容,也就是保留 100% 的冗余空间。当长度超过 SDS_MAX_PREALLOC 之后,每次扩容只会多分配 SDS_MAX_PREALLOC 大小的冗余空间,避免加倍扩容后的冗余空间过大导致浪费

    sds sdsMakeRoomFor(sds s, size_t addlen) {
          
          
     void *sh, *newsh;
     size_t avail = sdsavail(s);
     size_t len, newlen;
     char type, oldtype = s[-1] & SDS_TYPE_MASK;
     int hdrlen;
    
     /* Return ASAP if there is enough space left. */
     if (avail >= addlen) return s;
    
     len = sdslen(s);
     sh = (char*)s-sdsHdrSize(oldtype);
     newlen = (len+addlen);
     if (newlen < SDS_MAX_PREALLOC)
         newlen *= 2;
     else
         newlen += SDS_MAX_PREALLOC;
    
     type = sdsReqType(newlen);
    
     /* Don't use type 5: the user is appending to the string and type 5 is
      * not able to remember empty space, so sdsMakeRoomFor() must be called
      * at every appending operation. */
     if (type == SDS_TYPE_5) type = SDS_TYPE_8;
    
     hdrlen = sdsHdrSize(type);
     if (oldtype==type) {
          
          
         newsh = s_realloc(sh, hdrlen+newlen+1);
         if (newsh == NULL) return NULL;
         s = (char*)newsh+hdrlen;
     } else {
          
          
         /* Since the header size changes, need to move the string forward,
          * and can't use realloc */
         newsh = s_malloc(hdrlen+newlen+1);
         if (newsh == NULL) return NULL;
         memcpy((char*)newsh+hdrlen, s, len+1);
         s_free(sh);
         s = (char*)newsh+hdrlen;
         s[-1] = type;
         sdssetlen(s, len);
     }
     sdssetalloc(s, newlen);
     return s;
    }
    
  2. SDS 缩容函数为sds.c#sdsclear(),从源码实现来看,其主要有以下操作,也就是它并不释放实际占用的内存,体现出一种惰性策略

    1. 重置 SDS 表头的 len 属性值为 0
    2. 将结束符放到 buf 数组最前面,相当于惰性地删除 buf 中的内容
    /* Modify an sds string in-place to make it empty (zero length).
    * However all the existing buffer is not discarded but set as free space
    * so that next append operations will not require allocations up to the
    * number of bytes previously available. */
    void sdsclear(sds s) {
          
          
     sdssetlen(s, 0);
     s[0] = '\0';
    }
    

猜你喜欢

转载自blog.csdn.net/weixin_45505313/article/details/108292168