一、实验说明
https://pdos.csail.mit.edu/6.824/labs/lab-kvraft.html
二、梳理
本次实验需要基于raft协议构建容错key-value服务,只要大多数节点都可用并且可以进行通信,即使其他故障或网络分区,key-value服务也继续处理客户端请求。
Clerks 发送 Put(), Append(), and Get() RPCs 到关联Raft(Leader)的kvserver。kvserver 将操作(Op)提交给Raft, Raft的log保存Put/Append/Get的操作。所有kvserver都按顺序执行Raft日志中的操作,将这些操作应用于它们的key/value数据库;其目的是让服务器维护key/value数据库的相同副本。
Part A: Key/value service without log compaction
该服务支持三种操作:Put(key,value), Append(key,arg)和Get(key)。它维护着一个简单的key/value数据库。Put()替换数据库中特定键的值,Append(key,arg) 将arg附加到键的值,而Get()获取键的当前值。
Client
- Clerk有时不知道那个是Leader,如果找错了Leader(WrongLeader),需要重试(ck.leaderId + 1 % len(ck.servers)),直到找到对的Leader
- 请求成功需要更新lastRequestId
- Clerk需要记住哪一个是最后一个RPC的Leader,可以避免浪费时间在每个RPC上搜索Leader
func (ck *Clerk) Get(key string) string {
requestId := ck.lastRequestId + 1
for {
args := GetArgs{
Key: key,
ClientId: ck.clientId,
RequestId: requestId,
}
var reply GetReply
ok := ck.servers[ck.leaderId].Call("KVServer.Get", &args, &reply)
if ok == false || reply.WrongLeader == true {
ck.leaderId = (ck.leaderId + 1) % len(ck.servers)
continue
}
ck.lastRequestId = requestId
return reply.Value
}
return ""
}
func (ck *Clerk) PutAppend(key string, value string, op string) {
requestId := ck.lastRequestId + 1
for {
args := PutAppendArgs{
Key: key,
Value: value,
Op: op,
ClientId: ck.clientId,
RequestId: requestId,
}
var reply PutAppendReply
ok := ck.servers[ck.leaderId].Call("KVServer.PutAppend", &args, &reply)
if ok == false || reply.WrongLeader == true {
ck.leaderId = (ck.leaderId + 1) % len(ck.servers)
continue
}
ck.lastRequestId = requestId
return
}
}
Server
KvServer
db:数据
chMap:解决多个Clerk操作raft(Leader)问题(根据ClientId和RequsetId判断是否为当前Clerk和本次修改)
lastAppliedRequestId:避免同一个Clerk(ClientId)重复操作
type KVServer struct {
mu sync.Mutex
me int
rf *raft.Raft
applyCh chan raft.ApplyMsg
maxraftstate int // snapshot if log grows this big
db map[string]string
chMap map[int]chan Op
lastAppliedRequestId map[int64]int
}
Op :提给RAFT的操作
index, _, isLeader := kv.rf.Start(op)
type Op struct {
Key string
Value string
Optype string
ClientId int64
RequestId int
}
StartKVServer中
启动goroutine来监视applyCh,一旦Raft apply(raft log达成一致setCommitIndex),就立马执行一个
注意点
- 同一个Clerk(ClientId)在一个term内将请求发送给kvserver leader,等待应答超时,然后在另一term内将请求重新发送给新的Leader。该请求应始终仅执行一次。(lasetAppliedRequestId)
go func() {
for msg := range kv.applyCh {
if msg.CommandValid == false {
continue
}
op := msg.Command.(Op)
kv.mu.Lock()
lastAppliedRequestId, ok := kv.lastAppliedRequestId[op.ClientId]
if ok == false || lastAppliedRequestId < op.RequestId {
switch op.Optype {
case "Put":
kv.db[op.Key] = op.Value
case "Append":
kv.db[op.Key] += op.Value
}
kv.lastAppliedRequestId[op.ClientId] = op.RequestId
}
ch, ok := kv.chMap[msg.CommandIndex]
kv.mu.Unlock()
if ok {
ch <- op
}
}
}()
PutAppend()和Get()处理程序使用Start()将命令提交到Raft日志,调用Start()之后,kvservers将需要等待goroutine完成Op操作(注意超时情况处理)。
func (kv *KVServer) waitRaftApply(op Op, timeout time.Duration) bool {
index, _, isLeader := kv.rf.Start(op)
if isLeader == false {
return true
}
var wrongLeader bool
kv.mu.Lock()
if _, ok := kv.chMap[index]; !ok {
kv.chMap[index] = make(chan Op, 1)
}
ch := kv.dispatcher[index]
kv.mu.Unlock()
select {
case notify := <-ch:
kv.mu.Lock()
delete(kv.chMap, index)
kv.mu.Unlock()
if notify.ClientId != op.ClientId || notify.RequestId != op.RequestId {
wrongLeader = true
} else {
wrongLeader = false
}
case <-time.After(timeout):
wrongLeader = true
}
return wrongLeader
}