win32file.ReadDirectoryChangesW 没有找到所有移动的文件

win32file.ReadDirectoryChangesW doesn't find all moved files

早上好,

我在 Python 中创建的程序遇到了一个特殊问题。似乎当我将文件从一个位置拖放到另一个位置时,并非所有文件都被模块注册为事件。

我一直在使用 win32file 和 win32con 尝试获取与将文件从一个位置移动到另一个位置以进行处理相关的所有事件。

这是我的检测代码的一小段:

import win32file
import win32con
def main():
    path_to_watch = 'D:\'
    _file_list_dir = 1
    # Create a watcher handle
    _h_dir = win32file.CreateFile(
        path_to_watch,
        _file_list_dir,
        win32con.FILE_SHARE_READ |
        win32con.FILE_SHARE_WRITE |
        win32con.FILE_SHARE_DELETE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    while 1:
        results = win32file.ReadDirectoryChangesW(
            _h_dir,
            1024,
            True,
            win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
            win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
            win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
            win32con.FILE_NOTIFY_CHANGE_SIZE |
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
            win32con.FILE_NOTIFY_CHANGE_SECURITY,
            None,
            None
        )
        for _action, _file in results:
            if _action == 1:
                print 'found!'
            if _action == 2:
                print 'deleted!'

我拖放了 7 个文件,结果只找到了 4 个。

# found!
# found!
# found!
# found!

如何检测所有丢失的文件?

[ActiveState.Docs]: win32file.ReadDirectoryChangesW (this is the best documentation that I could find for [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions) is a wrapper over [MS.Docs]: ReadDirectoryChangesW function。这是它的说明(关于缓冲区):

1。一般

When you first call ReadDirectoryChangesW, the system allocates a buffer to store change information. This buffer is associated with the directory handle until it is closed and its size does not change during its lifetime. Directory changes that occur between calls to this function are added to the buffer and then returned with the next call. If the buffer overflows, the entire contents of the buffer are discarded, the lpBytesReturned parameter contains zero, and the ReadDirectoryChangesW function fails with the error code ERROR_NOTIFY_ENUM_DIR.

  • 我的理解这是一个不同于缓冲区的缓冲区,而不是作为参数传递的缓冲区(lpBuffer):

    • 前者传递给每次调用 ReadDirectoryChangesW(可能是不同的缓冲区(不同大小)传递每次调用)

    • 后者是系统分配的,而前者明明是在函数调用前(由用户)分配的
      就是那个 在函数调用之间存储数据(可能以某种原始格式),当函数被调用时,缓冲区内容被复制(并格式化)到 lpBuffer (如果没有溢出(并丢弃)同时)

2。同步

Upon successful synchronous completion, the lpBuffer parameter is a formatted buffer and the number of bytes written to the buffer is available in lpBytesReturned. If the number of bytes transferred is zero, the buffer was either too large for the system to allocate or too small to provide detailed information on all the changes that occurred in the directory or subtree. In this case, you should compute the changes by enumerating the directory or subtree.

  • 这多少证实了我之前的假设

    • "缓冲区太大,系统无法分配" - 也许在分配前一点的缓冲区时,它会考虑 nBufferLength?

无论如何,我把你的代码改了“一点”。

code00.py:

import sys
import msvcrt
import pywintypes
import win32file
import win32con
import win32api
import win32event


FILE_LIST_DIRECTORY = 0x0001
FILE_ACTION_ADDED = 0x00000001
FILE_ACTION_REMOVED = 0x00000002

ASYNC_TIMEOUT = 5000

BUF_SIZE = 65536


def get_dir_handle(dir_name, asynch):
    flags_and_attributes = win32con.FILE_FLAG_BACKUP_SEMANTICS
    if asynch:
        flags_and_attributes |= win32con.FILE_FLAG_OVERLAPPED
    dir_handle = win32file.CreateFile(
        dir_name,
        FILE_LIST_DIRECTORY,
        (win32con.FILE_SHARE_READ |
         win32con.FILE_SHARE_WRITE |
         win32con.FILE_SHARE_DELETE),
        None,
        win32con.OPEN_EXISTING,
        flags_and_attributes,
        None
    )
    return dir_handle


def read_dir_changes(dir_handle, size_or_buf, overlapped):
    return win32file.ReadDirectoryChangesW(
        dir_handle,
        size_or_buf,
        True,
        (win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
         win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
         win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
         win32con.FILE_NOTIFY_CHANGE_SIZE |
         win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
         win32con.FILE_NOTIFY_CHANGE_SECURITY),
        overlapped,
        None
    )


def handle_results(results):
    for item in results:
        print("    {} {:d}".format(item, len(item[1])))
        _action, _ = item
        if _action == FILE_ACTION_ADDED:
            print("    found!")
        if _action == FILE_ACTION_REMOVED:
            print("    deleted!")


def esc_pressed():
    return msvcrt.kbhit() and ord(msvcrt.getch()) == 27


def monitor_dir_sync(dir_handle):
    idx = 0
    while True:
        print("Index: {:d}".format(idx))
        idx += 1
        results = read_dir_changes(dir_handle, BUF_SIZE, None)
        handle_results(results)
        if esc_pressed():
            break


def monitor_dir_async(dir_handle):
    idx = 0
    buffer = win32file.AllocateReadBuffer(BUF_SIZE)
    overlapped = pywintypes.OVERLAPPED()
    overlapped.hEvent = win32event.CreateEvent(None, False, 0, None)
    while True:
        print("Index: {:d}".format(idx))
        idx += 1
        read_dir_changes(dir_handle, buffer, overlapped)
        rc = win32event.WaitForSingleObject(overlapped.hEvent, ASYNC_TIMEOUT)
        if rc == win32event.WAIT_OBJECT_0:
            bufer_size = win32file.GetOverlappedResult(dir_handle, overlapped, True)
            results = win32file.FILE_NOTIFY_INFORMATION(buffer, bufer_size)
            handle_results(results)
        elif rc == win32event.WAIT_TIMEOUT:
            #print("    timeout...")
            pass
        else:
            print("Received {:d}. Exiting".format(rc))
            break
        if esc_pressed():
            break
    win32api.CloseHandle(overlapped.hEvent)


def monitor_dir(dir_name, asynch=False):
    dir_handle = get_dir_handle(dir_name, asynch)
    if asynch:
        monitor_dir_async(dir_handle)
    else:
        monitor_dir_sync(dir_handle)
    win32api.CloseHandle(dir_handle)


def main():
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    asynch = True
    print("Attempting {}ynchronous mode using a buffer {:d} bytes long...".format("As" if async else "S", BUF_SIZE))
    monitor_dir(".\test", asynch=asynch)


if __name__ == "__main__":
    main()

备注:

  • 尽可能使用常量
  • 将您的代码拆分为函数,使其模块化(同时避免重复)
  • 添加了 print 语句以增加输出
  • 添加了异步功能(如果目录中没有activity,脚本不会永远挂起)
  • 添加了一种当用户按下 ESC 时退出的方法(当然在同步模式下 dir 中的事件也必须发生)
  • 使用不同的值获得不同的结果

输出:

e:\Work\Dev\Whosebug\q049799109>dir /b test
0123456789.txt
01234567890123456789.txt
012345678901234567890123456789.txt
0123456789012345678901234567890123456789.txt
01234567890123456789012345678901234567890123456789.txt
012345678901234567890123456789012345678901234567890123456789.txt
0123456789012345678901234567890123456789012345678901234567890123456789.txt
01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt

e:\Work\Dev\Whosebug\q049799109>
e:\Work\Dev\Whosebug\q049799109>"C:\Install\x64\HPE\OPSWpython.7.10__00\python.exe" code00.py
Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32

Attempting Synchronous mode using a buffer 512 bytes long...
Index: 0
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    deleted!
Index: 1
    (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    deleted!
Index: 2
    (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    deleted!
Index: 3
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    deleted!
    (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    deleted!
Index: 4
    (2, u'01234567890123456789012345678901234567890123456789.txt') 54
    deleted!
Index: 5
    (2, u'0123456789012345678901234567890123456789.txt') 44
    deleted!
    (2, u'012345678901234567890123456789.txt') 34
    deleted!
Index: 6
    (2, u'01234567890123456789.txt') 24
    deleted!
    (2, u'0123456789.txt') 14
    deleted!
Index: 7
    (1, u'0123456789.txt') 14
    found!
Index: 8
    (3, u'0123456789.txt') 14
Index: 9
    (1, u'01234567890123456789.txt') 24
    found!
Index: 10
    (3, u'01234567890123456789.txt') 24
    (1, u'012345678901234567890123456789.txt') 34
    found!
    (3, u'012345678901234567890123456789.txt') 34
    (1, u'0123456789012345678901234567890123456789.txt') 44
    found!
Index: 11
    (3, u'0123456789012345678901234567890123456789.txt') 44
    (1, u'01234567890123456789012345678901234567890123456789.txt') 54
    found!
    (3, u'01234567890123456789012345678901234567890123456789.txt') 54
Index: 12
Index: 13
    (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    found!
Index: 14
Index: 15
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    found!
Index: 16
    (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
Index: 17
    (1, u'a') 1
    found!
Index: 18
    (3, u'a') 1

e:\Work\Dev\Whosebug\q049799109>
e:\Work\Dev\Whosebug\q049799109>"C:\Install\x64\HPE\OPSWpython.7.10__00\python.exe" code00.py
Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32

Attempting Synchronous mode using a buffer 65536 bytes long...
Index: 0
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    deleted!
Index: 1
    (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    deleted!
Index: 2
    (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    deleted!
Index: 3
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    deleted!
Index: 4
    (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    deleted!
Index: 5
    (2, u'01234567890123456789012345678901234567890123456789.txt') 54
    deleted!
Index: 6
    (2, u'0123456789012345678901234567890123456789.txt') 44
    deleted!
Index: 7
    (2, u'012345678901234567890123456789.txt') 34
    deleted!
    (2, u'01234567890123456789.txt') 24
    deleted!
    (2, u'0123456789.txt') 14
    deleted!
Index: 8
    (1, u'0123456789.txt') 14
    found!
Index: 9
    (3, u'0123456789.txt') 14
Index: 10
    (1, u'01234567890123456789.txt') 24
    found!
Index: 11
    (3, u'01234567890123456789.txt') 24
Index: 12
    (1, u'012345678901234567890123456789.txt') 34
    found!
Index: 13
    (3, u'012345678901234567890123456789.txt') 34
Index: 14
    (1, u'0123456789012345678901234567890123456789.txt') 44
    found!
Index: 15
    (3, u'0123456789012345678901234567890123456789.txt') 44
Index: 16
    (1, u'01234567890123456789012345678901234567890123456789.txt') 54
    found!
    (3, u'01234567890123456789012345678901234567890123456789.txt') 54
Index: 17
    (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    found!
    (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    found!
Index: 18
    (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    found!
    (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    found!
    (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    found!
    (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
Index: 20
    (2, u'a') 1
    deleted!

e:\Work\Dev\Whosebug\q049799109>
e:\Work\Dev\Whosebug\q049799109>"C:\Install\x64\HPE\OPSWpython.7.10__00\python.exe" code00.py
Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32

Attempting Asynchronous mode using a buffer 512 bytes long...
Index: 0
Index: 1
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    deleted!
Index: 2
    (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    deleted!
Index: 3
    (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    deleted!
Index: 4
    (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    deleted!
Index: 5
    (2, u'01234567890123456789012345678901234567890123456789.txt') 54
    deleted!
Index: 6
    (2, u'0123456789012345678901234567890123456789.txt') 44
    deleted!
Index: 7
    (2, u'012345678901234567890123456789.txt') 34
    deleted!
Index: 8
    (2, u'01234567890123456789.txt') 24
    deleted!
Index: 9
    (2, u'0123456789.txt') 14
    deleted!
Index: 10
Index: 11
Index: 12
    (1, u'0123456789.txt') 14
    found!
Index: 13
    (1, u'01234567890123456789.txt') 24
    found!
Index: 14
    (1, u'012345678901234567890123456789.txt') 34
    found!
Index: 15
    (3, u'012345678901234567890123456789.txt') 34
Index: 16
    (1, u'0123456789012345678901234567890123456789.txt') 44
    found!
    (3, u'0123456789012345678901234567890123456789.txt') 44
Index: 17
Index: 18
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    found!
Index: 19
Index: 20
    (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    found!
Index: 21
Index: 22
Index: 23
Index: 24

e:\Work\Dev\Whosebug\q049799109>
e:\Work\Dev\Whosebug\q049799109>"C:\Install\x64\HPE\OPSWpython.7.10__00\python.exe" code00.py
Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32

Attempting Asynchronous mode using a buffer 65536 bytes long...
Index: 0
Index: 1
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    deleted!
Index: 2
    (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    deleted!
Index: 3
    (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    deleted!
Index: 4
    (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    deleted!
Index: 5
    (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    deleted!
Index: 6
    (2, u'01234567890123456789012345678901234567890123456789.txt') 54
    deleted!
Index: 7
    (2, u'0123456789012345678901234567890123456789.txt') 44
    deleted!
Index: 8
    (2, u'012345678901234567890123456789.txt') 34
    deleted!
    (2, u'01234567890123456789.txt') 24
    deleted!
Index: 9
    (2, u'0123456789.txt') 14
    deleted!
Index: 10
Index: 11
Index: 12
    (1, u'0123456789.txt') 14
    found!
Index: 13
    (1, u'01234567890123456789.txt') 24
    found!
Index: 14
    (1, u'012345678901234567890123456789.txt') 34
    found!
Index: 15
    (3, u'012345678901234567890123456789.txt') 34
    (1, u'0123456789012345678901234567890123456789.txt') 44
    found!
    (3, u'0123456789012345678901234567890123456789.txt') 44
Index: 16
    (1, u'01234567890123456789012345678901234567890123456789.txt') 54
    found!
    (3, u'01234567890123456789012345678901234567890123456789.txt') 54
    (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    found!
    (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    found!
Index: 17
    (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
    (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    found!
    (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
    (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    found!
    (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
    (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    found!
    (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
Index: 18
Index: 19

备注:

  • 使用目录 test 包含 10 个不同名称的文件(重复 0123456789
  • 有4个运行:
    1. 同步
      • 512B缓冲区
      • 64K缓冲区
    2. 异步
      • 512B缓冲区
      • 64K缓冲区
  • 对于每个(以上)运行,文件是(使用Windows指挥官操作):
    • 从移动目录(涉及删除
    • Moved (back) to the dir (涉及add)
  • 每个组合只有一个 运行,到目前为止 不能作为基准,但我 运行 脚本几次,模式趋于一致
  • 删除文件在 运行 秒内变化不大,这意味着事件在(极小的)时间内均匀分布
  • 另一方面,添加文件取决于缓冲区大小。另一个值得注意的是,每次添加都有 2 个事件
  • 从性能的角度来看,异步模式没有带来任何改进(如我所料),相反,它往往会减慢速度。但它最大的优点是可以在超时时优雅地退出(异常中断可能会保持资源锁定直到程序退出(有时甚至超过!))

最重要的是,没有避免丢失事件的方法。通过增加生成事件的数量,可以“击败”采取的每项措施。

最小化损失:

  • 缓冲区大小。这是您案例中的(主要)问题。不幸的是,文档并没有那么清晰,没有关于它应该有多大的指导方针。浏览 C 论坛,我注意到 64K 是一个常见的值。然而:

    • 不可能有一个巨大的缓冲区,如果在成功之前减小它的大小失败,因为这将意味着丢失所有在计算缓冲区大小时生成的事件

    • 即使 64k 足以保存(多次)我在测试中生成的所有事件,但仍有一些丢失。可能是因为开头说的那个“神奇”的buffer吧

  • 尽量减少事件的数量。在你的情况下,我注意到你只对添加和删除事件感兴趣(FILE_ACTION_ADDEDFILE_ACTION_REMOVED)。只指定适当的 FILE_NOTIFY_CHANGE_* 标志到 ReadDirectoryChangesW (例如你不关心 FILE_ACTION_MODIFIED,但是你是在添加文件的时候收到的)

  • 尝试将 dir 内容拆分成几个子目录并同时监视它们。例如,如果您只关心一个 dir 和它的一堆子目录中发生的更改,那么递归监视整个树是没有意义的,因为它很可能会产生很多无用的事件。无论如何,如果并行处理,不要使用线程,因为 GIL!!![Python.Wiki]: GlobalInterpreterLock). Use [Python.Docs]: multiprocessing - Process-based “threading” interface 而不是

  • 提高循环中 运行 代码的速度,使其在 ReadDirectoryChangesW 之外花费尽可能少的时间(当生成事件时可能会溢出缓冲区)。当然,下面的一些项目可能影响不大并且(也有一些不良的副作用)但我还是列出它们:

    • 尽量少处理,尽量拖延。也许在另一个过程中进行(因为 GIL

    • 去掉所有print like语句

    • 而不是例如win32con.FILE_NOTIFY_CHANGE_FILE_NAME在脚本开头使用from win32con import FILE_NOTIFY_CHANGE_FILE_NAME,并且只使用 FILE_NOTIFY_CHANGE_FILE_NAME 在循环中(避免在模块中查找变量)

    • 不要使用函数(因为 call / ret 之类的指令)- 不确定

    • 尝试使用win32file.GetQueuedCompletionStatus方法获取结果(仅async )

    • 随着时间的推移,情况往往会变得更好(当然也有例外),请尝试切换到较新的 Python 版本。也许它会 运行 更快

    • 使用 C - 这可能是不可取的,但它可能有一些好处:

      • 不会在PythonC之间来回转换PyWin32 执行 - 但我没有使用分析器来检查在其中花费了多少时间

      • lpCompletionRoutinePyWin32 不提供)也可以使用,也许它更快

      • 作为替代方案,可以使用 CTypes 调用 C,但这需要一些工作,我觉得它不值得