Redis 6.0 源码阅读笔记(9)-数据淘汰原理

1. 过期时间的存储

redisDb 数据结构 一节中已经提到过,redis 数据库中有一个专门的 expires 字典用于存储显式设置了过期时间的数据(如 SETEX 命令设置的数据)。本节以 SETEX 命令为例,依据源码分析过期时间的设置过程

typedef struct redisDb {
    
    
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

源码分析

  1. SETEX 命令的处理函数为t_string.c#setexCommand(),可以看到该函数只是个入口,其本身并没有多少逻辑

    void setexCommand(client *c) {
          
          
     c->argv[3] = tryObjectEncoding(c->argv[3]);
     setGenericCommand(c,OBJ_SET_NO_FLAGS,c->argv[1],c->argv[3],c->argv[2],UNIT_SECONDS,NULL,NULL);
    }
    
  2. t_string.c#setGenericCommand() 函数在之前的文章中也提到过,此次重点关注和过期时间相关部分,可以看到关键的流程如下:

    根据 expire 参数,函数中有不同的处理。如果 expire 大于 0,说明客户端传输过来的命令显式设置了过期时间,则首先要对其进行转化校验。之后调用 db.c#setExpire() 函数将 key 和 过期时间保存到 redis 数据库的过期字典中

    void setGenericCommand(client *c, int flags, robj *key, robj *val, robj *expire, int unit, robj *ok_reply, robj *abort_reply) {
          
          
     long long milliseconds = 0; /* initialized to avoid any harmness warning */
    
     if (expire) {
          
          
         if (getLongLongFromObjectOrReply(c, expire, &milliseconds, NULL) != C_OK)
             return;
         if (milliseconds <= 0) {
          
          
             addReplyErrorFormat(c,"invalid expire time in %s",c->cmd->name);
             return;
         }
         if (unit == UNIT_SECONDS) milliseconds *= 1000;
     }
     
     ......
     
     if (expire) setExpire(c,c->db,key,mstime()+milliseconds);
     
     ......
    
    
  3. db.c#setExpire() 函数逻辑非常简练,可以分为以下几步:

    1. 首先调用 dictFind() 函数从数据库保存普通数据的字典(db->dict)中找到 key 所在的节点,据此判断 key 是否存在
    2. key 存在的话,调用 dictAddOrFind() 函数使用这个 key 在数据库的过期字典(db->expires)中插入一个新的节点或者找到已经存在的节点,然后调用dictSetSignedIntegerVal() 函数将过期时间设置为这个节点的 value
    3. 最后如果当前服务端是可写的从节点,则需要将过期数据专门记录下来,调用 rememberSlaveKeyWithExpire() 函数实现
    /* Set an expire to the specified key. If the expire is set in the context
    * of an user calling a command 'c' is the client, otherwise 'c' is set
    * to NULL. The 'when' parameter is the absolute unix time in milliseconds
    * after which the key will no longer be considered valid. */
    void setExpire(client *c, redisDb *db, robj *key, long long when) {
          
          
     dictEntry *kde, *de;
    
     /* Reuse the sds from the main dict in the expire dict */
     kde = dictFind(db->dict,key->ptr);
     serverAssertWithInfo(NULL,key,kde != NULL);
     de = dictAddOrFind(db->expires,dictGetKey(kde));
     dictSetSignedIntegerVal(de,when);
    
     int writable_slave = server.masterhost && server.repl_slave_ro == 0;
     if (c && writable_slave && !(c->flags & CLIENT_MASTER))
         rememberSlaveKeyWithExpire(db,key);
    }
    

2. 数据的淘汰

Redis 内存淘汰策略一节中,我们提到了 redis 共有 6 种数据淘汰的策略,本节将介绍 redis 是如何执行这些策略的。不过在此之前,我们首先要知道 redis 的过期数据删除其实有两种触发方式:

  1. 主动删除
    发生在 redis 处理读写请求的过程,例如执行 get/set 等命令
  2. 定期删除
    发生在 redis 定时任务执行过程

2.1 主动删除

主动删除数据的动作其实也会多处触发,首先服务端解析完客户端传输过来的命令,准备执行前会检查 redis 占用内存是否超过了配置值,从而判断是否需要释放空间。另外在命令执行的过程中也会检查 key 是否过期了,对过期的 key 需要删除处理

2.1.1 命令执行前触发

  1. 这部分的处理在server.c#processCommand() 函数中,不了解命令处理流程的读者可参考Redis 6.0 源码阅读笔记(1)-Redis 服务端启动及命令执行。以下源码省略了与数据淘汰无关的部分,可以看到其主要逻辑如下:

    1. 如果 redis 配置中设置了最大内存,并且 lua 脚本没有超时,则需要进行下一步处理
    2. 调用freeMemoryIfNeededAndSafe()函数进行数据淘汰处理
    int processCommand(client *c) {
          
          
     
     ......
    
     /* Handle the maxmemory directive.
      *
      * Note that we do not want to reclaim memory if we are here re-entering
      * the event loop since there is a busy Lua script running in timeout
      * condition, to avoid mixing the propagation of scripts with the
      * propagation of DELs due to eviction. */
     if (server.maxmemory && !server.lua_timedout) {
          
          
         int out_of_memory = freeMemoryIfNeededAndSafe() == C_ERR;
         /* freeMemoryIfNeeded may flush slave output buffers. This may result
          * into a slave, that may be the active client, to be freed. */
         if (server.current_client == NULL) return C_ERR;
    
         /* It was impossible to free enough memory, and the command the client
          * is trying to execute is denied during OOM conditions or the client
          * is in MULTI/EXEC context? Error. */
         if (out_of_memory &&
             (is_denyoom_command ||
              (c->flags & CLIENT_MULTI &&
               c->cmd->proc != discardCommand)))
         {
          
          
             rejectCommand(c, shared.oomerr);
             return C_OK;
         }
    
         /* Save out_of_memory result at script start, otherwise if we check OOM
          * untill first write within script, memory used by lua stack and
          * arguments might interfere. */
         if (c->cmd->proc == evalCommand || c->cmd->proc == evalShaCommand) {
          
          
             server.lua_oom = out_of_memory;
         }
     }
    
     ......
     
    }
    
  2. evict.c#freeMemoryIfNeededAndSafe() 函数只是个入口,真正的数据淘汰处理在evict.c#freeMemoryIfNeeded() 函数中。这个函数的实现代码很长,不过可以分为以下几个步骤:

    1. 首先进行各项检查,例如调用 getMaxmemoryState() 函数检查服务端当前占用内存是不是超过了最大内存设置,之后还要检查 redis 服务端最大内存的处理策略 (server.maxmemory_policy)是不是禁止删除数据
    2. 根据服务端最大内存的处理策略的不同,会有不同的处理:
      1. 对于最近最少使用 LRU最少使用 LFU 以及到达过期时间 TTL 这几种对 key 有一定要求的删除策略,需要遍历所有 redis 数据库,根据最大内存的处理策略进一步确定删除 key 的范围(所有数据(db->dict)或者过期数据(db->expires)),然后调用 evict.c#evictionPoolPopulate() 函数从中挑选出可以删除的候选 key,最后确定一个最佳的 bestkey
      2. 对于随机淘汰这种对 key 没太多要求的删除策略,同样遍历所有 redis 数据库,根据最大内存的处理策略确定删除 key 的范围,然后调用 dict.c#dictGetRandomKey() 挑选 bestkey
    3. 确定了要删除的 bestkey,将其删除即可。如果释放的内存还是没有达到要求,while 循环继续
    int freeMemoryIfNeededAndSafe(void) {
          
          
     if (server.lua_timedout || server.loading) return C_OK;
     return freeMemoryIfNeeded();
    }
    
    /* This function is periodically called to see if there is memory to free
     * according to the current "maxmemory" settings. In case we are over the
     * memory limit, the function will try to free some memory to return back
     * under the limit.
     *
     * The function returns C_OK if we are under the memory limit or if we
     * were over the limit, but the attempt to free memory was successful.
     * Otehrwise if we are over the memory limit, but not enough memory
     * was freed to return back under the limit, the function returns C_ERR. */
    int freeMemoryIfNeeded(void) {
          
          
     int keys_freed = 0;
     /* By default replicas should ignore maxmemory
      * and just be masters exact copies. */
     if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;
    
     size_t mem_reported, mem_tofree, mem_freed;
     mstime_t latency, eviction_latency, lazyfree_latency;
     long long delta;
     int slaves = listLength(server.slaves);
     int result = C_ERR;
    
     /* When clients are paused the dataset should be static not just from the
      * POV of clients not being able to write, but also from the POV of
      * expires and evictions of keys not being performed. */
     if (clientsArePaused()) return C_OK;
     if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
         return C_OK;
    
     mem_freed = 0;
    
     latencyStartMonitor(latency);
     if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
         goto cant_free; /* We need to free memory, but policy forbids. */
    
     while (mem_freed < mem_tofree) {
          
          
         int j, k, i;
         static unsigned int next_db = 0;
         sds bestkey = NULL;
         int bestdbid;
         redisDb *db;
         dict *dict;
         dictEntry *de;
    
         if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
             server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
         {
          
          
             struct evictionPoolEntry *pool = EvictionPoolLRU;
    
             while(bestkey == NULL) {
          
          
                 unsigned long total_keys = 0, keys;
    
                 /* We don't want to make local-db choices when expiring keys,
                  * so to start populate the eviction pool sampling keys from
                  * every DB. */
                 for (i = 0; i < server.dbnum; i++) {
          
          
                     db = server.db+i;
                     dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                             db->dict : db->expires;
                     if ((keys = dictSize(dict)) != 0) {
          
          
                         evictionPoolPopulate(i, dict, db->dict, pool);
                         total_keys += keys;
                     }
                 }
                 if (!total_keys) break; /* No keys to evict. */
    
                 /* Go backward from best to worst element to evict. */
                 for (k = EVPOOL_SIZE-1; k >= 0; k--) {
          
          
                     if (pool[k].key == NULL) continue;
                     bestdbid = pool[k].dbid;
    
                     if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
          
          
                         de = dictFind(server.db[pool[k].dbid].dict,
                             pool[k].key);
                     } else {
          
          
                         de = dictFind(server.db[pool[k].dbid].expires,
                             pool[k].key);
                     }
    
                     /* Remove the entry from the pool. */
                     if (pool[k].key != pool[k].cached)
                         sdsfree(pool[k].key);
                     pool[k].key = NULL;
                     pool[k].idle = 0;
    
                     /* If the key exists, is our pick. Otherwise it is
                      * a ghost and we need to try the next element. */
                     if (de) {
          
          
                         bestkey = dictGetKey(de);
                         break;
                     } else {
          
          
                         /* Ghost... Iterate again. */
                     }
                 }
             }
         }
    
         /* volatile-random and allkeys-random policy */
         else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                  server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
         {
          
          
             /* When evicting a random key, we try to evict a key for
              * each DB, so we use the static 'next_db' variable to
              * incrementally visit all DBs. */
             for (i = 0; i < server.dbnum; i++) {
          
          
                 j = (++next_db) % server.dbnum;
                 db = server.db+j;
                 dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                         db->dict : db->expires;
                 if (dictSize(dict) != 0) {
          
          
                     de = dictGetRandomKey(dict);
                     bestkey = dictGetKey(de);
                     bestdbid = j;
                     break;
                 }
             }
         }
    
         /* Finally remove the selected key. */
         if (bestkey) {
          
          
             db = server.db+bestdbid;
             robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
             propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
             /* We compute the amount of memory freed by db*Delete() alone.
              * It is possible that actually the memory needed to propagate
              * the DEL in AOF and replication link is greater than the one
              * we are freeing removing the key, but we can't account for
              * that otherwise we would never exit the loop.
              *
              * AOF and Output buffer memory will be freed eventually so
              * we only care about memory used by the key space. */
             delta = (long long) zmalloc_used_memory();
             latencyStartMonitor(eviction_latency);
             if (server.lazyfree_lazy_eviction)
                 dbAsyncDelete(db,keyobj);
             else
                 dbSyncDelete(db,keyobj);
             signalModifiedKey(NULL,db,keyobj);
             latencyEndMonitor(eviction_latency);
             latencyAddSampleIfNeeded("eviction-del",eviction_latency);
             delta -= (long long) zmalloc_used_memory();
             mem_freed += delta;
             server.stat_evictedkeys++;
             notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                 keyobj, db->id);
             decrRefCount(keyobj);
             keys_freed++;
    
             /* When the memory to free starts to be big enough, we may
              * start spending so much time here that is impossible to
              * deliver data to the slaves fast enough, so we force the
              * transmission here inside the loop. */
             if (slaves) flushSlavesOutputBuffers();
    
             /* Normally our stop condition is the ability to release
              * a fixed, pre-computed amount of memory. However when we
              * are deleting objects in another thread, it's better to
              * check, from time to time, if we already reached our target
              * memory, since the "mem_freed" amount is computed only
              * across the dbAsyncDelete() call, while the thread can
              * release the memory all the time. */
             if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
          
          
                 if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
          
          
                     /* Let's satisfy our stop condition. */
                     mem_freed = mem_tofree;
                 }
             }
         } else {
          
          
             goto cant_free; /* nothing to free... */
         }
     }
     result = C_OK;
    
    cant_free:
     /* We are here if we are not able to reclaim memory. There is only one
      * last thing we can try: check if the lazyfree thread has jobs in queue
      * and wait... */
     if (result != C_OK) {
          
          
         latencyStartMonitor(lazyfree_latency);
         while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
          
          
             if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
          
          
                 result = C_OK;
                 break;
             }
             usleep(1000);
         }
         latencyEndMonitor(lazyfree_latency);
         latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
     }
     latencyEndMonitor(latency);
     latencyAddSampleIfNeeded("eviction-cycle",latency);
     return result;
    }
    

2.1.2 命令执行时触发

  1. 以本文第一节中的 SETEX命令为例,t_string.c#setGenericCommand() 函数中最终将数据保存进 redis 数据库的函数为 genericSetKey()

    void setGenericCommand(client *c, int flags, robj *key, robj *val, robj *expire, int unit, robj *ok_reply, robj *abort_reply) {
          
          
     ......
     genericSetKey(c,c->db,key,val,flags & OBJ_SET_KEEPTTL,1);
     ......
    }
    
  2. t_string.c#genericSetKey() 函数的实现很简练,不再赘述,此处重点关注 lookupKeyWrite() 函数

    void genericSetKey(client *c, redisDb *db, robj *key, robj *val, int keepttl, int signal) {
          
          
     if (lookupKeyWrite(db,key) == NULL) {
          
          
         dbAdd(db,key,val);
     } else {
          
          
         dbOverwrite(db,key,val);
     }
     incrRefCount(val);
     if (!keepttl) removeExpire(db,key);
     if (signal) signalModifiedKey(c,db,key);
    }
    
  3. db.c#lookupKeyWrite() 函数只是个入口,可以看到它最终会调用到 expireIfNeeded() 函数,而这个函数就是 redis 命令执行过程中删除过期数据的关键

    robj *lookupKeyWriteWithFlags(redisDb *db, robj *key, int flags) {
          
          
     expireIfNeeded(db,key);
     return lookupKey(db,key,flags);
    }
    
    robj *lookupKeyWrite(redisDb *db, robj *key) {
          
          
     return lookupKeyWriteWithFlags(db, key, LOOKUP_NONE);
    }
    
  4. db.c#expireIfNeeded() 函数逻辑较为简单,主要做了如下几个动作:

    1. 调用函数 keyIsExpired() 判断 key 是否过期
    2. 向slave节点传播执行过期 key 的动作并发送事件通知
    3. 删除过期 key
    int expireIfNeeded(redisDb *db, robj *key) {
          
          
     if (!keyIsExpired(db,key)) return 0;
    
     /* If we are running in the context of a slave, instead of
      * evicting the expired key from the database, we return ASAP:
      * the slave key expiration is controlled by the master that will
      * send us synthesized DEL operations for expired keys.
      *
      * Still we try to return the right information to the caller,
      * that is, 0 if we think the key should be still valid, 1 if
      * we think the key is expired at this time. */
     if (server.masterhost != NULL) return 1;
    
     /* Delete the key */
     server.stat_expiredkeys++;
     propagateExpire(db,key,server.lazyfree_lazy_expire);
     notifyKeyspaceEvent(NOTIFY_EXPIRED,
         "expired",key,db->id);
     int retval = server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
                                                dbSyncDelete(db,key);
     if (retval) signalModifiedKey(NULL,db,key);
     return retval;
    }
    

2.2 定期删除

  1. 定期删除是通过 redis 定时任务实现的,而定时任务入口为 server.c#serverCron() 函数。这个函数实现代码较多,以下省略不相关的部分,只关注定期删除数据的部分,关键函数为 server.c#databasesCron()

    int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
          
          
     
     ......
     
    
     /* Handle background operations on Redis databases. */
     databasesCron();
    
     /* Start a scheduled AOF rewrite if this was requested by the user while
      * a BGSAVE was in progress. */
     if (!hasActiveChildProcess() &&
         server.aof_rewrite_scheduled)
     {
          
          
         rewriteAppendOnlyFileBackground();
     }
    
     /* Check if a background saving or AOF rewrite in progress terminated. */
     if (hasActiveChildProcess() || ldbPendingChildren())
     {
          
          
         checkChildrenDone();
     } else {
          
          
         /* If there is not a background saving/rewrite in progress check if
          * we have to save/rewrite now. */
         for (j = 0; j < server.saveparamslen; j++) {
          
          
             struct saveparam *sp = server.saveparams+j;
    
             /* Save if we reached the given amount of changes,
              * the given amount of seconds, and if the latest bgsave was
              * successful or if, in case of an error, at least
              * CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
             if (server.dirty >= sp->changes &&
                 server.unixtime-server.lastsave > sp->seconds &&
                 (server.unixtime-server.lastbgsave_try >
                  CONFIG_BGSAVE_RETRY_DELAY ||
                  server.lastbgsave_status == C_OK))
             {
          
          
                 serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
                     sp->changes, (int)sp->seconds);
                 rdbSaveInfo rsi, *rsiptr;
                 rsiptr = rdbPopulateSaveInfo(&rsi);
                 rdbSaveBackground(server.rdb_filename,rsiptr);
                 break;
             }
         }
    
         /* Trigger an AOF rewrite if needed. */
         if (server.aof_state == AOF_ON &&
             !hasActiveChildProcess() &&
             server.aof_rewrite_perc &&
             server.aof_current_size > server.aof_rewrite_min_size)
         {
          
          
             long long base = server.aof_rewrite_base_size ?
                 server.aof_rewrite_base_size : 1;
             long long growth = (server.aof_current_size*100/base) - 100;
             if (growth >= server.aof_rewrite_perc) {
          
          
                 serverLog(LL_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
                 rewriteAppendOnlyFileBackground();
             }
         }
     }
    
    
     /* AOF postponed flush: Try at every cron cycle if the slow fsync
      * completed. */
     if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);
    
     /* AOF write errors: in this case we have a buffer to flush as well and
      * clear the AOF error in case of success to make the DB writable again,
      * however to try every second is enough in case of 'hz' is set to
      * an higher frequency. */
     run_with_period(1000) {
          
          
         if (server.aof_last_write_status == C_ERR)
             flushAppendOnlyFile(0);
     }
     
     ......
    }
    
  2. server.c#databasesCron() 函数会对数据库执行删除过期键、调整大小以及渐进式 rehash 等动作,本节主要关注删除过期键的操作,这部分由expire.c#activeExpireCycle()函数实现

    void databasesCron(void) {
          
          
     /* Expire keys by random sampling. Not required for slaves
      * as master will synthesize DELs for us. */
     if (server.active_expire_enabled) {
          
          
         if (iAmMaster()) {
          
          
             activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
         } else {
          
          
             expireSlaveKeys();
         }
     }
    
     /* Defrag keys gradually. */
     activeDefragCycle();
    
     /* Perform hash tables rehashing if needed, but only if there are no
      * other processes saving the DB on disk. Otherwise rehashing is bad
      * as will cause a lot of copy-on-write of memory pages. */
     if (!hasActiveChildProcess()) {
          
          
         /* We use global counters so if we stop the computation at a given
          * DB we'll be able to start from the successive in the next
          * cron loop iteration. */
         static unsigned int resize_db = 0;
         static unsigned int rehash_db = 0;
         int dbs_per_call = CRON_DBS_PER_CALL;
         int j;
    
         /* Don't test more DBs than we have. */
         if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;
    
         /* Resize */
         for (j = 0; j < dbs_per_call; j++) {
          
          
             tryResizeHashTables(resize_db % server.dbnum);
             resize_db++;
         }
    
         /* Rehash */
         if (server.activerehashing) {
          
          
             for (j = 0; j < dbs_per_call; j++) {
          
          
                 int work_done = incrementallyRehash(rehash_db);
                 if (work_done) {
          
          
                     /* If the function did some work, stop here, we'll do
                      * more at the next cron loop. */
                     break;
                 } else {
          
          
                     /* If this db didn't need rehash, we'll try the next one. */
                     rehash_db++;
                     rehash_db %= server.dbnum;
                 }
             }
         }
     }
    }
    
  3. expire.c#activeExpireCycle()函数实现代码很长,其中需要注意的步骤如下:

    1. 遍历指定个数的db(如16)进行删除过期数据的操作
    2. 针对每个 db 每轮遍历不超过指定数量(如20)的节点,随机获取有过期时间的节点数据,调用activeExpireCycleTryExpire() 函数尝试删除数据
    3. 每个db 的遍历的轮数累积到16次的时候,会判断使用的时间是否超过定时任务执行时间的 25%(timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100),超过就停止删除数据过程
    4. 最后如果已经删除的过期数据随机选中的待过期数据的比值超过了配置值,也停止删除数据
    #define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */
    void activeExpireCycle(int type) {
          
          
       
        ......
        
         for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
          
          
         /* Expired and checked in a single loop. */
         unsigned long expired, sampled;
    
         redisDb *db = server.db+(current_db % server.dbnum);
    
         /* Increment the DB now so we are sure if we run out of time
          * in the current DB we'll restart from the next. This allows to
          * distribute the time evenly across DBs. */
         current_db++;
    
         /* Continue to expire if at the end of the cycle there are still
          * a big percentage of keys to expire, compared to the number of keys
          * we scanned. The percentage, stored in config_cycle_acceptable_stale
          * is not fixed, but depends on the Redis configured "expire effort". */
         do {
          
          
             unsigned long num, slots;
             long long now, ttl_sum;
             int ttl_samples;
             iteration++;
    
             /* If there is nothing to expire try next DB ASAP. */
             if ((num = dictSize(db->expires)) == 0) {
          
          
                 db->avg_ttl = 0;
                 break;
             }
             slots = dictSlots(db->expires);
             now = mstime();
    
             /* When there are less than 1% filled slots, sampling the key
              * space is expensive, so stop here waiting for better times...
              * The dictionary will be resized asap. */
             if (num && slots > DICT_HT_INITIAL_SIZE &&
                 (num*100/slots < 1)) break;
    
             /* The main collection cycle. Sample random keys among keys
              * with an expire set, checking for expired ones. */
             expired = 0;
             sampled = 0;
             ttl_sum = 0;
             ttl_samples = 0;
    
             if (num > config_keys_per_loop)
                 num = config_keys_per_loop;
    
             /* Here we access the low level representation of the hash table
              * for speed concerns: this makes this code coupled with dict.c,
              * but it hardly changed in ten years.
              *
              * Note that certain places of the hash table may be empty,
              * so we want also a stop condition about the number of
              * buckets that we scanned. However scanning for free buckets
              * is very fast: we are in the cache line scanning a sequential
              * array of NULL pointers, so we can scan a lot more buckets
              * than keys in the same time. */
             long max_buckets = num*20;
             long checked_buckets = 0;
    
             while (sampled < num && checked_buckets < max_buckets) {
          
          
                 for (int table = 0; table < 2; table++) {
          
          
                     if (table == 1 && !dictIsRehashing(db->expires)) break;
    
                     unsigned long idx = db->expires_cursor;
                     idx &= db->expires->ht[table].sizemask;
                     dictEntry *de = db->expires->ht[table].table[idx];
                     long long ttl;
    
                     /* Scan the current bucket of the current table. */
                     checked_buckets++;
                     while(de) {
          
          
                         /* Get the next entry now since this entry may get
                          * deleted. */
                         dictEntry *e = de;
                         de = de->next;
    
                         ttl = dictGetSignedIntegerVal(e)-now;
                         if (activeExpireCycleTryExpire(db,e,now)) expired++;
                         if (ttl > 0) {
          
          
                             /* We want the average TTL of keys yet
                              * not expired. */
                             ttl_sum += ttl;
                             ttl_samples++;
                         }
                         sampled++;
                     }
                 }
                 db->expires_cursor++;
             }
             total_expired += expired;
             total_sampled += sampled;
    
             /* Update the average TTL stats for this database. */
             if (ttl_samples) {
          
          
                 long long avg_ttl = ttl_sum/ttl_samples;
    
                 /* Do a simple running average with a few samples.
                  * We just use the current estimate with a weight of 2%
                  * and the previous estimate with a weight of 98%. */
                 if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                 db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
             }
    
             /* We can't block forever here even if there are many keys to
              * expire. So after a given amount of milliseconds return to the
              * caller waiting for the other active expire cycle. */
             if ((iteration & 0xf) == 0) {
          
           /* check once every 16 iterations. */
                 elapsed = ustime()-start;
                 if (elapsed > timelimit) {
          
          
                     timelimit_exit = 1;
                     server.stat_expired_time_cap_reached_count++;
                     break;
                 }
             }
             /* We don't repeat the cycle for the current database if there are
              * an acceptable amount of stale keys (logically expired but yet
              * not reclaimed). */
         } while (sampled == 0 ||
                  (expired*100/sampled) > config_cycle_acceptable_stale);
     }
     ......
    }
    
  4. expire.c#activeExpireCycleTryExpire()是真正尝试删除过期数据的处理函数,以下源码简单明了,不再赘述

    int activeExpireCycleTryExpire(redisDb *db, dictEntry *de, long long now) {
          
          
     long long t = dictGetSignedIntegerVal(de);
     if (now > t) {
          
          
         sds key = dictGetKey(de);
         robj *keyobj = createStringObject(key,sdslen(key));
    
         propagateExpire(db,keyobj,server.lazyfree_lazy_expire);
         if (server.lazyfree_lazy_expire)
             dbAsyncDelete(db,keyobj);
         else
             dbSyncDelete(db,keyobj);
         notifyKeyspaceEvent(NOTIFY_EXPIRED,
             "expired",keyobj,db->id);
         trackingInvalidateKey(NULL,keyobj);
         decrRefCount(keyobj);
         server.stat_expiredkeys++;
         return 1;
     } else {
          
          
         return 0;
     }
    }
    

猜你喜欢

转载自blog.csdn.net/weixin_45505313/article/details/109259175