如何在获取避免竞争的独占锁时打开(如果不存在则创建)文件

how to open (create if not exists) a file while acquiring exclusive lock avoiding races

在 python 2.7 中,是否有可能(以及如何)在单个原子(无竞争)操作中:


Context:我有一个 python 程序,它会在给定 URL/md5 列表的情况下获取文件;如果列表中的一个文件存在并且它是 md5 匹配的,它将被跳过。如果没有,它将被下载。现在,该程序可能有多个实例处理可能重叠的不同列表。

This question 几乎是我需要做的,但在我的情况下,我需要以任何一种方式锁定文件以检查它的 md5,同时防止其他人也这样做。另外,我不需要在操作之前知道文件是否存在;如果是刚刚创建的,文件将为空,md5 不匹配,所以无论如何都会下载。

我专门在 Linux 上使用此程序,但欢迎使用跨平台解决方案。


编辑: 最后我通过以下方式解决了我的问题:

就目前而言,单个原子步骤不支持所需的操作,但也不需要。

不可能,至少按照this comprehensive report:

  • mv -T <oldsymlink> <newsymlink> atomically changes the target of <newsymlink> to the directory pointed to by <oldsymlink> and is indispensable when deploying new code. Updated 2010-01-06: both operands are symlinks. (So this isn’t a system call, it’s still useful.) A reader pointed out that ln -Tfs <directory> <symlink> accomplishes the same thing without the second symlink. Added 2010-01-06. Deleted 2010-01-06: strace(1) shows that ln -Tfs <directory> <symlink> actually calls symlink(2), unlink(2), and symlink(2) once more, disqualifying it from this page. mv -T <oldsymlink> <newsymlink> ends up calling rename(2) which can atomically replace <newsymlink>. Caveat 2013-01-07: this does not apply to Mac OS X, whose mv(1) doesn’t call rename(2). mv(1).
  • link(oldpath, newpath) creates a new hard link called newpath pointing to the same inode as oldpath and increases the link count by one. This will fail with the error code EEXIST if newpath already exists, making this a useful mechanism for locking a file amongst threads or processes that can all agree upon the name newpath. I prefer this technique for whole-file locking because the lock is visible to ls(1). link(2).
  • symlink(oldpath, newpath) operates very much like link(2) but creates a symbolic link at a new inode rather than a hard link to the same inode. Symbolic links can point to directories, which hard links cannot, making them a perfect analogy to link(2) when locking entire directories. This will fail with the error code EEXIST if newpath already exists, making this a perfect analogy to link(2) that works for directories, too. Be careful of symbolic links whose target inode has been removed ("dangling" symbolic links) — open(2) will fail with the error code ENOENT. It should be mentioned that inodes are a finite resource (this particular machine has 1,245,184 inodes). symlink(2). Added 2010-01-07
  • rename(oldpath, newpath) can change a pathname atomically, provided oldpath and newpath are on the same filesystem. This will fail with the error code ENOENT if oldpath does not exist, enabling interprocess locking much like link(oldpath, newpath) above. I find this technique more natural when the files in question will be unlinked later. rename(2).
  • open(pathname, O_CREAT | O_EXCL, 0644) creates and opens a new file. (Don’t forget to set the mode in the third argument!) O_EXCL instructs this to fail with the error code EEXIST if pathname exists. This is a useful way to decide which process should handle a task: whoever successfully creates the file. open(2).
  • mkdir(dirname, 0755) creates a new directory but fails with the error code EEXIST if dirname exists. This provides for directories the same mechanism link(2) open(2) with O_EXCL provides for files. mkdir(2). Added 2010-01-06; edited 2013-01-07.

如您所见,open() 只能原子地用于创建 文件,不能打开现有文件进行阅读。如果你想使用这种方法,你可能想使用 Python 的 os.open(),它是这个系统调用的代理(不要与内置的 open() 混淆)。

您也可以考虑使用数据库来完成这项任务,因为它们应该提供更高的可靠性(例如,如果您的文件托管在 NFS 上,它根本不实现任何锁定,而 IIRC 是唯一的原子操作 mkdir()?).

不,作为 Linux/UNIX 支持的基本操作是不可能的。

the answer you referenced 中的 O_CREAT|O_EXCL 技巧在这里可以发挥作用。您不是独占地创建目标文件,而是独占地创建一个锁定文件,其名称可预测地从目标文件派生。例如,os.path.join("/tmp", hashlib.md5(target_filename).hexdigest() + ".lock").

但是,正如其他人所建议的,不清楚您是否需要同时保护目标文件创建及其校验和以及可能的替换。 fcntl 建议锁将满足您的需要。