1. 存储的结构

在 redis 字符串对象 String 的介绍中，我们知道 redis 对于字符串的存储共有 3 种存储形式，其存储的内存结构如以下图片示例:

OBJ_ENCODING_INT：保存的字符串长度小于 20，并且是可以解析为 long 类型的整数值，那么存储方式就是直接将 redisObject 的 ptr 指针指向这个整数值
OBJ_ENCODING_EMBSTR: 长度小于 44 （OBJ_ENCODING_EMBSTR_SIZE_LIMIT）的字符串将以简单动态字符串（SDS）的形式存储在 redisObject 中，但是redisObject 对象头会和 SDS 对象连续存在一起
OBJ_ENCODING_RAW: 字符串以简单动态字符串（SDS）的形式存储，redisObject 对象头和 SDS 对象在内存地址上一般是不连续的两块内存

2. 数据存储源码分析

2.1 数据存储过程

在 Redis 6.0 源码阅读笔记(1)-Redis 服务端启动及命令执行中我们已经知道客户端保存字符串的 set 命令将会调用到 t_string.c#setCommand() 函数，其源码实现如下：
该方法中有以下两个重点函数被调用，本节主要关注 tryObjectEncoding() 函数
1. tryObjectEncoding() 将客户端传输过来的需保存的字符串对象尝试进行编码，以节约内存
2. setGenericCommand() 将 key-value 保存到数据库中
```
void setCommand(client *c) {
      
      

 ......

 c->argv[2] = tryObjectEncoding(c->argv[2]);
 setGenericCommand(c,flags,c->argv[1],c->argv[2],expire,unit,NULL,NULL);
}
```

object.c#tryObjectEncoding() 函数逻辑很清晰，可以看到主要进行了以下几个操作：

当字符串长度小于 20 并且可以被解析为 long 类型数据时，这个数据将以整数形式保存，并以 robj->ptr = (void*) value 这种直接赋值的形式存储

当字符串长度小于等于 OBJ_ENCODING_EMBSTR_SIZE_LIMIT 配置并且还是 raw 编码时，调用 createEmbeddedStringObject() 函数将其转化为 embstr 编码

这个字符串对象已经不能进行转码了，只好调用 trimStringObjectIfNeeded() 函数尝试从字符串对象中移除所有空余空间

robj *tryObjectEncoding(robj *o) {
      
      
 long value;
 sds s = o->ptr;
 size_t len;

 /* Make sure this is a string object, the only type we encode
  * in this function. Other types use encoded memory efficient
  * representations but are handled by the commands implementing
  * the type. */
 serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

 /* We try some specialized encoding only for objects that are
  * RAW or EMBSTR encoded, in other words objects that are still
  * in represented by an actually array of chars. */
 if (!sdsEncodedObject(o)) return o;

 /* It's not safe to encode shared objects: shared objects can be shared
  * everywhere in the "object space" of Redis and may end in places where
  * they are not handled. We handle them only as values in the keyspace. */
  if (o->refcount > 1) return o;

 /* Check if we can represent this string as a long integer.
  * Note that we are sure that a string larger than 20 chars is not
  * representable as a 32 nor 64 bit integer. */
 len = sdslen(s);
 if (len <= 20 && string2l(s,len,&value)) {
      
      
     /* This object is encodable as a long. Try to use a shared object.
      * Note that we avoid using shared integers when maxmemory is used
      * because every object needs to have a private LRU field for the LRU
      * algorithm to work well. */
     if ((server.maxmemory == 0 ||
         !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
         value >= 0 &&
         value < OBJ_SHARED_INTEGERS)
     {
      
      
         decrRefCount(o);
         incrRefCount(shared.integers[value]);
         return shared.integers[value];
     } else {
      
      
         if (o->encoding == OBJ_ENCODING_RAW) {
      
      
             sdsfree(o->ptr);
             o->encoding = OBJ_ENCODING_INT;
             o->ptr = (void*) value;
             return o;
         } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
      
      
             decrRefCount(o);
             return createStringObjectFromLongLongForValue(value);
         }
     }
 }

 /* If the string is small and is still RAW encoded,
  * try the EMBSTR encoding which is more efficient.
  * In this representation the object and the SDS string are allocated
  * in the same chunk of memory to save space and cache misses. */
 if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
      
      
     robj *emb;

     if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
     emb = createEmbeddedStringObject(s,sdslen(s));
     decrRefCount(o);
     return emb;
 }

 /* We can't encode the object...
  *
  * Do the last try, and at least optimize the SDS string inside
  * the string object to require little space, in case there
  * is more than 10% of free space at the end of the SDS string.
  *
  * We do that only for relatively large strings as this branch
  * is only entered if the length of the string is greater than
  * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
 trimStringObjectIfNeeded(o);

 /* Return the original object. */
 return o;
}

object.c#createEmbeddedStringObject() 函数实现 embstr 编码也很简单，主要步骤如下：

首先调用 zmalloc() 函数申请内存，可以看到此处不仅申请了需要存储的字符串的内存及 redisObject 的内存，还申请了 SDS 实现结构体之一 sdshdr8 的内存，这也就是上文所说embstr 编码只申请一次内存，并且redisObject 对象头会和 SDS 对象连续存在一起的由来

将 redisObject 对象的 ptr 指针指向 sdshdr8 开始的内存地址

填充 sdshdr8 对象各个属性，包括 len 字符串长度，alloc 字符数组容量，实际存储字符串的 buf 字符数组

robj *createEmbeddedStringObject(const char *ptr, size_t len) {
      
      
 robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
 struct sdshdr8 *sh = (void*)(o+1);

 o->type = OBJ_STRING;
 o->encoding = OBJ_ENCODING_EMBSTR;
 o->ptr = sh+1;
 o->refcount = 1;
 if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
      
      
     o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
 } else {
      
      
     o->lru = LRU_CLOCK();
 }

 sh->len = len;
 sh->alloc = len;
 sh->flags = SDS_TYPE_8;
 if (ptr == SDS_NOINIT)
     sh->buf[len] = '\0';
 else if (ptr) {
      
      
     memcpy(sh->buf,ptr,len);
     sh->buf[len] = '\0';
 } else {
      
      
     memset(sh->buf,0,len+1);
 }
 return o;
}

raw 编码字符串的创建可参考 object.c#createRawStringObject() 函数，其涉及到两次内存申请，sds.c#sdsnewlen() 申请内存创建 SDS 对象，object.c#createObject() 申请内存创建 redisObject 对象
```
robj *createRawStringObject(const char *ptr, size_t len) {
      
      
 return createObject(OBJ_STRING, sdsnewlen(ptr,len));
}
```

从检测容量大小的的函数t_string.c#checkStringLength()看，字符串最大长度为 512M，超出该数值将报错

static int checkStringLength(client *c, long long size) {
      
      
 if (size > 512*1024*1024) {
      
      
     addReplyError(c,"string exceeds maximum allowed size (512MB)");
     return C_ERR;
 }
 return C_OK;
}

2.2 简单动态字符串 SDS

2.2.1 SDS 结构体

SDS(简单动态字符串) 在 Redis 中是实现字符串存储的工具，本质上依然是字符数组，但它不像C语言字符串那样以‘\0’来标识字符串结束

传统C字符串符合ASCII编码，这种编码的操作的特点就是：遇零则止。即当读一个字符串时，只要遇到’\0’就认为到达末尾，忽略’\0’以后的所有字符。另外其获得字符串长度的做法是遍历字符串，遇零则止，时间复杂度为O(n)，比较低效

SDS 的实现结构定义在 sds.h 中，其定义如下。因为 SDS 判断是否到达字符串末尾的依据是表头的 len 属性，所以能高效计算字符串长度并快速追加数据

sds 结构一共有 5 种 Header 定义，目的是为不同长度的字符串提供不同大小的 Header，以节省内存。以 sdshdr8 为例，其 len 属性为 uint8_t 类型，占用内存大小为 1 字节，则存储的字符串最大长度为256。Header 主要包含以下几个属性：

len: 字符串真正的长度，不包含空终止字符

alloc: 除去表头和终止符的 buf 数组长度，也就是最大容量

flags: 标志 header 的类型

buf: 字符数组，实际存储字符

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    
    
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    
    
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    
    
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    
    
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    
    
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

2.2.2 SDS 容量调整

SDS 扩容的函数是sds.c#sdsMakeRoomFor()，当字符串长度小于 1M 时，扩容都是加倍现有的空间，如果超过 1M，扩容时一次只会多扩 1M 的空间。以下为源码实现：

字符串在长度小于 SDS_MAX_PREALLOC(1024*1024，也就是1MB，定义在 sds.h) 之前，采用 2 倍扩容，也就是保留 100% 的冗余空间。当长度超过 SDS_MAX_PREALLOC 之后，每次扩容只会多分配 SDS_MAX_PREALLOC 大小的冗余空间，避免加倍扩容后的冗余空间过大导致浪费

sds sdsMakeRoomFor(sds s, size_t addlen) {
      
      
 void *sh, *newsh;
 size_t avail = sdsavail(s);
 size_t len, newlen;
 char type, oldtype = s[-1] & SDS_TYPE_MASK;
 int hdrlen;

 /* Return ASAP if there is enough space left. */
 if (avail >= addlen) return s;

 len = sdslen(s);
 sh = (char*)s-sdsHdrSize(oldtype);
 newlen = (len+addlen);
 if (newlen < SDS_MAX_PREALLOC)
     newlen *= 2;
 else
     newlen += SDS_MAX_PREALLOC;

 type = sdsReqType(newlen);

 /* Don't use type 5: the user is appending to the string and type 5 is
  * not able to remember empty space, so sdsMakeRoomFor() must be called
  * at every appending operation. */
 if (type == SDS_TYPE_5) type = SDS_TYPE_8;

 hdrlen = sdsHdrSize(type);
 if (oldtype==type) {
      
      
     newsh = s_realloc(sh, hdrlen+newlen+1);
     if (newsh == NULL) return NULL;
     s = (char*)newsh+hdrlen;
 } else {
      
      
     /* Since the header size changes, need to move the string forward,
      * and can't use realloc */
     newsh = s_malloc(hdrlen+newlen+1);
     if (newsh == NULL) return NULL;
     memcpy((char*)newsh+hdrlen, s, len+1);
     s_free(sh);
     s = (char*)newsh+hdrlen;
     s[-1] = type;
     sdssetlen(s, len);
 }
 sdssetalloc(s, newlen);
 return s;
}

SDS 缩容函数为sds.c#sdsclear()，从源码实现来看，其主要有以下操作，也就是它并不释放实际占用的内存，体现出一种惰性策略

重置 SDS 表头的 len 属性值为 0

将结束符放到 buf 数组最前面，相当于惰性地删除 buf 中的内容

/* Modify an sds string in-place to make it empty (zero length).
* However all the existing buffer is not discarded but set as free space
* so that next append operations will not require allocations up to the
* number of bytes previously available. */
void sdsclear(sds s) {
      
      
 sdssetlen(s, 0);
 s[0] = '\0';
}

Redis 6.0 源码阅读笔记(4)- String 数据类型源码分析

文章目录