带ttl的etcd互斥锁

etcd mutex lock with ttl

我正在尝试创建一个简单的演示 golang etcd 客户端程序,它使用 etcd 互斥体创建一个共享锁,有超时。目标是让互斥锁在一段时间后过期。

package main

import (
    "context"
    "log"
    "time"

    "go.etcd.io/etcd/clientv3"
    "go.etcd.io/etcd/clientv3/concurrency"
)

var c chan int

func init() {
    c = make(chan int)
}

func main() {
    client, err := clientv3.New(clientv3.Config{
        Endpoints: []string{"http://localhost:2379"},
    })
    if err != nil {
        panic(err)
    }

    watcher := clientv3.NewWatcher(client)
    channel := watcher.Watch(context.Background(), "/foobar", clientv3.WithPrefix())
    go func() {
        for {
            select {
            case change := <-channel:
                for _, ev := range change.Events {
                    log.Printf("etcd change on key; %s, type = %v", string(ev.Kv.Key), ev.Type)
                }
            }
        }
    }()

    go lockFoobar(client, 1)
    go lockFoobar(client, 2)
    <-c
    <-c
}

func lockFoobar(client *clientv3.Client, id int) {
    res, err := client.Grant(context.Background(), 1)
    if err != nil {
        panic(err)
    }

    session, err := concurrency.NewSession(client, concurrency.WithLease(res.ID))
    if err != nil {
        panic(err)
    }

    mux := concurrency.NewMutex(session, "/foobar")

    log.Printf("trying to lock by #%d\n", id)
    ctx, _ := context.WithTimeout(context.Background(), 15*time.Second)
    if err := mux.Lock(ctx); err != nil {
        log.Printf("failed to lock #%d: %v\n", id, err)
        c <- id
        return
    }

    log.Printf("post-lock #%d (lease ID = %x) bullshit\n", id, res.ID)
    time.Sleep(10 * time.Second)
    ttl, _ := client.TimeToLive(context.TODO(), res.ID)
    log.Printf("post-post-lock-#%d-sleep. lease ttl = %v", id, ttl.TTL)
    // mux.Unlock(ctx)
    // log.Printf("post-unlock #%d bullshit\n", id)

    time.Sleep(200 * time.Millisecond)
    c <- id
}

租约的 ttl 为 1 秒,而上下文的超时时间为 5 秒,因此,锁应该在上下文过期时被删除。但是,无论上下文超时如何,"locked" 锁总是仅在失败的锁之后被删除。

这是当前输出:

2018-10-04 18:39:59.413274 I | trying to lock by #2
2018-10-04 18:39:59.414530 I | trying to lock by #1
2018-10-04 18:39:59.414656 I | etcd change on key; /foobar/2a0966398d0677a2, type = PUT
2018-10-04 18:39:59.414684 I | post-lock #2 (lease ID = 2a0966398d0677a2) bullshit
2018-10-04 18:39:59.415617 I | etcd change on key; /foobar/2a0966398d0677a4, type = PUT
2018-10-04 18:40:10.239045 I | post-post-lock-#2-sleep. lease ttl = 1                       <-- lock for #2 has ttl = 1 even after 10s
2018-10-04 18:40:15.238871 I | failed to lock #1: context deadline exceeded                 <-- lock for #1 fails after 15s

如您所见,#2 的锁在 15 秒后仍然有效。

运行 ETCDCTL_API=3 etcdctl watch --prefix=true /foobar 在另一个终端中观察键的变化显示以下输出

PUT
/foobar/2a0966398d0677a2

PUT
/foobar/2a0966398d0677a4

DELETE
/foobar/2a0966398d0677a4

DELETE
/foobar/2a0966398d0677a2

这是预期的行为吗?有什么方法可以完成我想要的吗?

P.S.: 真实世界的用例是创建一个程序,它在多个实例中运行并且不会在崩溃时在 etcd 中留下锁 and/or kill (SIGKILL).

经过一番搜索,我找到了这种行为的原因。会话保持租约有效,直到出现错误或取消。

来自session.go

...
// keep the lease alive until client error or cancelled context
go func() {
    defer close(donec)
    for range keepAlive {
        // eat messages until keep alive channel closes
    }
}()
...

Callint session.Orphan() 在创建互斥体后将阻止会话保持活动状态并达到我的目的。