通过 Powershell 查找给定文件的完整路径的最快方法?
Fastest way to find a full path of a given file via Powershell?
我需要编写一个 Powershell 片段,以尽快找到整个分区上给定文件名的完整路径。
为了更好的比较,我在代码示例中使用了这个全局变量:
$searchDir = "c:\"
$searchName = "hosts"
我从一个使用 Get-ChildItem 的小片段开始,以获得第一个基线:
"get-ChildItem"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = Get-ChildItem -LiteralPath $searchDir -Filter $searchName -File -Recurse -ea 0
write-host $timer.Elapsed.TotalSeconds "sec."
我的 SSD 上的运行时间是 14,8581609 秒。
接下来,我尝试了 运行 经典的 DIR 命令以查看改进:
"dir"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = &cmd /c dir "$searchDir$searchName" /b /s /a-d
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
这在 13,4713342 秒内完成。 - 还不错,但我们可以更快地完成吗?
在第三次迭代中,我使用 ROBOCOPY 测试了相同的任务。这里的代码示例:
"robocopy"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$roboDir = [System.IO.Path]::GetDirectoryName($searchDir)
if (!$roboDir) {$roboDir = $searchDir.Substring(0,2)}
$info = [System.Diagnostics.ProcessStartInfo]::new()
$info.FileName = "$env:windir\system32\robocopy.exe"
$info.RedirectStandardOutput = $true
$info.Arguments = " /l ""$roboDir"" null ""$searchName"" /bytes /njh /njs /np /nc /ndl /xjd /mt /s"
$info.UseShellExecute = $false
$info.CreateNoWindow = $true
$info.WorkingDirectory = $searchDir
$process = [System.Diagnostics.Process]::new()
$process.StartInfo = $info
[void]$process.Start()
$process.WaitForExit()
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
或更短的版本(基于好的评论):
"robocopy v2"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = (&cmd /c pushd $searchDir `& robocopy /l "$searchDir" null "$searchName" /ns /njh /njs /np /nc /ndl /xjd /mt /s).trim() -ne ''
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
它比 DIR 快吗?是的,一点没错!运行时间现在下降到 3,2685551 秒。
这一巨大改进的主要原因是,ROBOCOPY 在多个并行实例中以多任务模式与 /mt-swich 一起运行。但即使没有这个涡轮开关也比 DIR 快。
任务完成?不是真的 - 因为我的任务是创建一个 powershell 脚本来尽可能快地搜索文件,但是调用 ROBOCOPY 有点作弊。
接下来,我想看看,使用[System.IO.Directory]我们能多快。第一次尝试是使用 getFiles 和 getDirectory 调用。这是我的代码:
"GetFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
try {
$files = [System.IO.Directory]::GetFiles($dir, $searchName)
if ($files) {$fileList.addRange($file)}
foreach($subdir in [System.IO.Directory]::GetDirectories($dir)) {
$dirList.Enqueue($subDir)
}
} catch {}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
这次的运行时间是 19,3393872 秒。迄今为止最慢的代码。我们能做得更好吗?现在这里有一个带有枚举调用的代码片段用于比较:
"EnumerateFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
try {
foreach($file in [System.IO.Directory]::EnumerateFiles($dir, $searchName)) {
$fileList.add($file)
}
foreach ($subdir in [System.IO.Directory]::EnumerateDirectories($dir)) {
$dirList.Enqueue($subDir)
}
} catch {}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
运行时间为 19,2068545 秒,速度稍快
现在让我们看看是否可以通过从 Kernel32 直接调用 WinAPI 来加快速度。
这是代码。让我们看看,这次有多快:
"WinAPI"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = 'c:'
$searchFile = "hosts"
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Enqueue($fullName)
} elseif ($fileData.cFileName -eq $searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
对我来说,这种方法的结果是一个非常负面的惊喜。运行时间为 17,499286 秒。
这比 System.IO 调用快,但仍然比简单的 Get-ChildItem 慢。
但是-仍然有希望接近ROBOCOPY的超快结果!
对于 Get-ChildItem 我们不能让调用在多任务模式下执行,但是对于例如Kernel32 调用我们可以选择将其设为递归函数,即通过嵌入式 C# 代码在 PARALLEL foreach 循环中对所有子文件夹的每次迭代进行调用。但是要怎么做呢?
有人知道如何更改最后的代码片段以使用 parallel.foreach 吗?
即使结果可能不像 ROBOCOPY 那样快,我也想 post 这里的这种方法为这个经典的“文件搜索”主题提供完整的故事书。
请让我知道并行代码部分是如何做的。
更新:
为了完整起见,我在 Powershell 7 上添加了 GetFiles 代码 运行 的代码和运行时,具有更智能的访问处理:
"GetFiles PS7"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [system.IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true}
)
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
我系统上的运行时间是 9,150673 秒。 - 比 DIR 快,但仍然比在 8 核上进行多任务处理的 robocopy 慢。
更新#2:
在试用了新的 PS7 功能之后,我想出了这个代码片段,它使用了我的第一个(但很丑?)并行代码方法:
"WinAPI PS7 parallel"
$searchDir = "c:\"
$searchFile = "hosts"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = $searchDir -replace "\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
(1..$maxRunSpaces) | ForEach-Object -ThrottleLimit $maxRunSpaces -Parallel {
$dirList = $using:dirList
$fileList = $using:fileList
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
$dir = $null
if ($_ -eq 1) {$delay = 0} else {$delay = 50}
if ($dirList.TryTake([ref]$dir, $delay)) {
do {
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Add($fullName)
} elseif ($fileData.cFileName -eq $using:searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
} until (!$dirList.TryTake([ref]$dir))
}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
运行时间现在非常接近 robocopy 时间。实际上是 4,0809719 秒
还不错,但我仍在寻找一种通过嵌入式 C# 代码使用 parallel.foreach 方法的解决方案,以使其也适用于 Powershell v5。
更新#3:
现在这是我在并行运行空间中的 Powershell 5 运行 的最终代码:
$searchDir = "c:\"
$searchFile = "hosts"
"WinAPI parallel"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = $searchDir -replace "\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$runSpaceList = [System.Collections.Generic.List[PSObject]]::new()
$pool = [RunSpaceFactory]::CreateRunspacePool(1, $maxRunSpaces)
$pool.Open()
foreach ($id in 1..$maxRunSpaces) {
$runSpace = [Powershell]::Create()
$runSpace.RunspacePool = $pool
[void]$runSpace.AddScript({
Param (
[string]$searchFile,
[System.Collections.Concurrent.BlockingCollection[string]]$dirList,
[System.Collections.Concurrent.BlockingCollection[string]]$fileList
)
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
$dir = $null
if ($id -eq 1) {$delay = 0} else {$delay = 50}
if ($dirList.TryTake([ref]$dir, $delay)) {
do {
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Add($fullName)
} elseif ($fileData.cFileName -like $searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
} until (!$dirList.TryTake([ref]$dir))
}
})
[void]$runSpace.addArgument($searchFile)
[void]$runSpace.addArgument($dirList)
[void]$runSpace.addArgument($fileList)
$status = $runSpace.BeginInvoke()
$runSpaceList.Add([PSCustomObject]@{Name = $id; RunSpace = $runSpace; Status = $status})
}
while ($runSpaceList.Status.IsCompleted -notcontains $true) {sleep -Milliseconds 10}
$pool.Close()
$pool.Dispose()
$timer.Stop()
$fileList
write-host $timer.Elapsed.TotalSeconds "sec."
总运行时间为 4,8586134 秒。比 PS7 版本慢一点,但仍然比任何 DIR 或 Get-ChildItem 变体快得多。 ;-)
最终解决方案:
最后我能够回答我自己的问题。这是最终代码:
"WinAPI parallel.foreach"
add-type -TypeDefinition @"
using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
public class FileSearch {
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
public static class Globals {
public static BlockingCollection<string> resultFileList {get;set;}
}
public static BlockingCollection<string> GetTreeFiles(string path, string searchFile) {
Globals.resultFileList = new BlockingCollection<string>();
List<string> dirList = new List<string>();
searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
GetFiles(path, searchFile);
return Globals.resultFileList;
}
static void GetFiles(string path, string searchFile) {
path = path.EndsWith(@"\") ? path : path + @"\";
List<string> dirList = new List<string>();
WIN32_FIND_DATA fileData;
IntPtr handle = INVALID_HANDLE_VALUE;
handle = FindFirstFile(path + @"*", out fileData);
if (handle != INVALID_HANDLE_VALUE) {
FindNextFile(handle, out fileData);
while (FindNextFile(handle, out fileData)) {
if ((fileData.dwFileAttributes & 0x10) > 0) {
string fullPath = path + fileData.cFileName;
dirList.Add(fullPath);
} else {
if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
string fullPath = path + fileData.cFileName;
Globals.resultFileList.TryAdd(fullPath);
}
}
}
FindClose(handle);
Parallel.ForEach(dirList, (dir) => {
GetFiles(dir, searchFile);
});
}
}
}
"@
[fileSearch]::GetTreeFiles($searchDir, 'hosts')
并且最终运行时间现在比 robocopy 快 3,2536388 秒。
我还在解决方案中添加了该代码的优化版本。
tl;dr:
这个答案没有尝试解决所问的并行问题,但是:
- 单个递归
[IO.Directory]::GetFiles()
调用可能足够快,但请注意,如果涉及不可访问的目录,这只是 PowerShell [Core] v6.2+ 中的一个选项:
# PowerShell [Core] v6.2+
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
- 从实用的角度来说(除了编码练习之外),调用
robocopy
是一种完全合法的方法 - 假设您只需要 运行 Windows - 就像(注意 con
是未使用的 target-directory 参数的虚拟参数)一样简单:
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
前面几点:
-
but calling ROBOCOPY is a bit of cheating.
- 可以说,使用 .NET APIs / WinAPI 调用与调用 RoboCopy 等外部实用程序(例如
robocopy.exe /l ...
)一样具有欺骗性。毕竟,调用外部程序是任何 shell 的核心任务,包括 PowerShell(而且 System.Diagnostics.Process
nor its PowerShell wrapper, Start-Process
都不需要这样做)。
也就是说,虽然在这种情况下不是问题,但当您调用外部程序时,您确实失去了传递和接收 对象 的能力,并且 in-process 操作通常更快。
为了定时执行命令(测量性能),PowerShell 提供了一个 high-level 包装 System.Diagnostics.Stopwatch
: the Measure-Command
cmdlet。
这样的性能测量值会波动,因为 PowerShell 作为一种动态解析的语言,会使用大量缓存,这些缓存在首次填充时会产生开销,而且您通常不知道什么时候会发生这种情况 - 请参阅this GitHub issue 了解背景信息。
此外,一个遍历文件系统的long-running命令同时受到其他进程运行ning的干扰,是否file-system 之前的信息已经被缓存了 运行 差别很大.
以下比较使用 higher-level 包装 Measure-Object
,Time-Command
function,这使得比较多个命令的相对 运行 时间性能简单。
加速 PowerShell 代码的关键是最小化实际的 PowerShell 代码,并尽可能多地将工作卸载给 .NET 方法调用/(编译的)外部程序。
以下对比表现:
Get-ChildItem
(只是为了对比,我们知道它太慢了)
robocopy.exe
对 System.IO.Directory.GetFiles()
的单个递归调用,可能 对您的目的足够快,尽管 单-线程.
- 注意:下面的调用使用仅在 .NET Core 2.1+ 中可用的功能,因此在 PowerShell 中有效[核心] 仅限 v6.2+。
此 API 的 .NET Framework 版本不允许忽略 无法访问的 目录(由于缺少权限),如果遇到此类目录,则会导致枚举失败。
$searchDir = 'C:\' #'# dummy comment to fix syntax highlighting
$searchFile = 'hosts'
# Define the commands to compare as an array of script blocks.
$cmds =
{
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
},
{
(Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
},
{
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
}
Write-Verbose -vb "Warming up the cache..."
# Run one of the commands up front to level the playing field
# with respect to cached filesystem information.
$null = & $cmds[-1]
# Run the commands and compare their timings.
Time-Command $cmds -Count 1 -OutputToHost -vb
在我的 2 核 Windows 10 VM 运行ning PowerShell Core 7.1.0-preview.7 上,我得到以下结果;这些数字因许多因素(不仅仅是文件数量)而异,但应该提供相对性能的一般意义(第 Factor
列)。
请注意,由于 file-system 缓存是有意预先预热的,因此与没有缓存信息的 运行 相比,给定机器的数字将过于乐观。
如您所见,在这种情况下,PowerShell [Core] [System.IO.Directory]::GetFiles()
调用实际上优于 multi-threaded robocopy
调用。
VERBOSE: Warming up the cache...
VERBOSE: Starting 1 run(s) of:
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
VERBOSE: Starting 1 run(s) of:
(Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
...
C:\Program Files\Git\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
VERBOSE: Starting 1 run(s) of:
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
VERBOSE: Overall time elapsed: 00:01:48.7731236
Factor Secs (1-run avg.) Command
------ ----------------- -------
1.00 22.500 [IO.Directory]::GetFiles(…
1.14 25.602 (robocopy /l $searchDir NUL $searchFile /s /mt /njh /njs /ns /nc /np).Trim() -ne ''
2.69 60.623 (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
这是我创建的最终代码。运行时间现在是 2,8627695 秒。
将并行性限制为逻辑核心的数量比对所有子目录执行 Parallel.ForEach 提供更好的性能。
您可以 return 将每次命中的完整 FileInfo-Object 放入生成的 BlockingCollection 中,而不是仅 return 文件名。
# powershell-sample to find all "hosts"-files on Partition "c:\"
cls
Remove-Variable * -ea 0
[System.GC]::Collect()
$ErrorActionPreference = "stop"
$searchDir = "c:\"
$searchFile = "hosts"
add-type -TypeDefinition @"
using System;
using System.IO;
using System.Linq;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
public class FileSearch {
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern bool FindClose(IntPtr hFindFile);
static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
static BlockingCollection<string> dirList {get;set;}
static BlockingCollection<string> fileList {get;set;}
public static BlockingCollection<string> GetFiles(string searchDir, string searchFile) {
bool isPattern = false;
if (searchFile.Contains(@"?") | searchFile.Contains(@"*")) {
searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
isPattern = true;
}
fileList = new BlockingCollection<string>();
dirList = new BlockingCollection<string>();
dirList.Add(searchDir);
int[] threads = Enumerable.Range(1,Environment.ProcessorCount).ToArray();
Parallel.ForEach(threads, (id) => {
string path;
IntPtr handle = INVALID_HANDLE_VALUE;
WIN32_FIND_DATA fileData;
if (dirList.TryTake(out path, 100)) {
do {
path = path.EndsWith(@"\") ? path : path + @"\";
handle = FindFirstFile(path + @"*", out fileData);
if (handle != INVALID_HANDLE_VALUE) {
FindNextFile(handle, out fileData);
while (FindNextFile(handle, out fileData)) {
if ((fileData.dwFileAttributes & 0x10) > 0) {
string fullPath = path + fileData.cFileName;
dirList.TryAdd(fullPath);
} else {
if (isPattern) {
if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
string fullPath = path + fileData.cFileName;
fileList.TryAdd(fullPath);
}
} else {
if (fileData.cFileName == searchFile) {
string fullPath = path + fileData.cFileName;
fileList.TryAdd(fullPath);
}
}
}
}
FindClose(handle);
}
} while (dirList.TryTake(out path));
}
});
return fileList;
}
}
"@
$fileList = [fileSearch]::GetFiles($searchDir, $searchFile)
$fileList
我需要编写一个 Powershell 片段,以尽快找到整个分区上给定文件名的完整路径。
为了更好的比较,我在代码示例中使用了这个全局变量:
$searchDir = "c:\"
$searchName = "hosts"
我从一个使用 Get-ChildItem 的小片段开始,以获得第一个基线:
"get-ChildItem"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = Get-ChildItem -LiteralPath $searchDir -Filter $searchName -File -Recurse -ea 0
write-host $timer.Elapsed.TotalSeconds "sec."
我的 SSD 上的运行时间是 14,8581609 秒。
接下来,我尝试了 运行 经典的 DIR 命令以查看改进:
"dir"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = &cmd /c dir "$searchDir$searchName" /b /s /a-d
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
这在 13,4713342 秒内完成。 - 还不错,但我们可以更快地完成吗?
在第三次迭代中,我使用 ROBOCOPY 测试了相同的任务。这里的代码示例:
"robocopy"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$roboDir = [System.IO.Path]::GetDirectoryName($searchDir)
if (!$roboDir) {$roboDir = $searchDir.Substring(0,2)}
$info = [System.Diagnostics.ProcessStartInfo]::new()
$info.FileName = "$env:windir\system32\robocopy.exe"
$info.RedirectStandardOutput = $true
$info.Arguments = " /l ""$roboDir"" null ""$searchName"" /bytes /njh /njs /np /nc /ndl /xjd /mt /s"
$info.UseShellExecute = $false
$info.CreateNoWindow = $true
$info.WorkingDirectory = $searchDir
$process = [System.Diagnostics.Process]::new()
$process.StartInfo = $info
[void]$process.Start()
$process.WaitForExit()
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
或更短的版本(基于好的评论):
"robocopy v2"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = (&cmd /c pushd $searchDir `& robocopy /l "$searchDir" null "$searchName" /ns /njh /njs /np /nc /ndl /xjd /mt /s).trim() -ne ''
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
它比 DIR 快吗?是的,一点没错!运行时间现在下降到 3,2685551 秒。 这一巨大改进的主要原因是,ROBOCOPY 在多个并行实例中以多任务模式与 /mt-swich 一起运行。但即使没有这个涡轮开关也比 DIR 快。
任务完成?不是真的 - 因为我的任务是创建一个 powershell 脚本来尽可能快地搜索文件,但是调用 ROBOCOPY 有点作弊。
接下来,我想看看,使用[System.IO.Directory]我们能多快。第一次尝试是使用 getFiles 和 getDirectory 调用。这是我的代码:
"GetFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
try {
$files = [System.IO.Directory]::GetFiles($dir, $searchName)
if ($files) {$fileList.addRange($file)}
foreach($subdir in [System.IO.Directory]::GetDirectories($dir)) {
$dirList.Enqueue($subDir)
}
} catch {}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
这次的运行时间是 19,3393872 秒。迄今为止最慢的代码。我们能做得更好吗?现在这里有一个带有枚举调用的代码片段用于比较:
"EnumerateFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
try {
foreach($file in [System.IO.Directory]::EnumerateFiles($dir, $searchName)) {
$fileList.add($file)
}
foreach ($subdir in [System.IO.Directory]::EnumerateDirectories($dir)) {
$dirList.Enqueue($subDir)
}
} catch {}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
运行时间为 19,2068545 秒,速度稍快
现在让我们看看是否可以通过从 Kernel32 直接调用 WinAPI 来加快速度。 这是代码。让我们看看,这次有多快:
"WinAPI"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = 'c:'
$searchFile = "hosts"
$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
while ($dirList.Count -ne 0) {
$dir = $dirList.Dequeue()
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Enqueue($fullName)
} elseif ($fileData.cFileName -eq $searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
对我来说,这种方法的结果是一个非常负面的惊喜。运行时间为 17,499286 秒。 这比 System.IO 调用快,但仍然比简单的 Get-ChildItem 慢。
但是-仍然有希望接近ROBOCOPY的超快结果! 对于 Get-ChildItem 我们不能让调用在多任务模式下执行,但是对于例如Kernel32 调用我们可以选择将其设为递归函数,即通过嵌入式 C# 代码在 PARALLEL foreach 循环中对所有子文件夹的每次迭代进行调用。但是要怎么做呢?
有人知道如何更改最后的代码片段以使用 parallel.foreach 吗? 即使结果可能不像 ROBOCOPY 那样快,我也想 post 这里的这种方法为这个经典的“文件搜索”主题提供完整的故事书。
请让我知道并行代码部分是如何做的。
更新: 为了完整起见,我在 Powershell 7 上添加了 GetFiles 代码 运行 的代码和运行时,具有更智能的访问处理:
"GetFiles PS7"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [system.IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true}
)
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
我系统上的运行时间是 9,150673 秒。 - 比 DIR 快,但仍然比在 8 核上进行多任务处理的 robocopy 慢。
更新#2: 在试用了新的 PS7 功能之后,我想出了这个代码片段,它使用了我的第一个(但很丑?)并行代码方法:
"WinAPI PS7 parallel"
$searchDir = "c:\"
$searchFile = "hosts"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = $searchDir -replace "\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
(1..$maxRunSpaces) | ForEach-Object -ThrottleLimit $maxRunSpaces -Parallel {
$dirList = $using:dirList
$fileList = $using:fileList
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
$dir = $null
if ($_ -eq 1) {$delay = 0} else {$delay = 50}
if ($dirList.TryTake([ref]$dir, $delay)) {
do {
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Add($fullName)
} elseif ($fileData.cFileName -eq $using:searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
} until (!$dirList.TryTake([ref]$dir))
}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."
运行时间现在非常接近 robocopy 时间。实际上是 4,0809719 秒
还不错,但我仍在寻找一种通过嵌入式 C# 代码使用 parallel.foreach 方法的解决方案,以使其也适用于 Powershell v5。
更新#3: 现在这是我在并行运行空间中的 Powershell 5 运行 的最终代码:
$searchDir = "c:\"
$searchFile = "hosts"
"WinAPI parallel"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
"@
$rootDir = $searchDir -replace "\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$runSpaceList = [System.Collections.Generic.List[PSObject]]::new()
$pool = [RunSpaceFactory]::CreateRunspacePool(1, $maxRunSpaces)
$pool.Open()
foreach ($id in 1..$maxRunSpaces) {
$runSpace = [Powershell]::Create()
$runSpace.RunspacePool = $pool
[void]$runSpace.AddScript({
Param (
[string]$searchFile,
[System.Collections.Concurrent.BlockingCollection[string]]$dirList,
[System.Collections.Concurrent.BlockingCollection[string]]$fileList
)
$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
$dir = $null
if ($id -eq 1) {$delay = 0} else {$delay = 50}
if ($dirList.TryTake([ref]$dir, $delay)) {
do {
$handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
[void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
if ($fileData.dwFileAttributes -band 0x10) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$dirList.Add($fullName)
} elseif ($fileData.cFileName -like $searchFile) {
$fullName = [string]::Join('\', $dir, $fileData.cFileName)
$fileList.Add($fullName)
}
}
[void][Win32.FileSearch]::FindClose($handle)
} until (!$dirList.TryTake([ref]$dir))
}
})
[void]$runSpace.addArgument($searchFile)
[void]$runSpace.addArgument($dirList)
[void]$runSpace.addArgument($fileList)
$status = $runSpace.BeginInvoke()
$runSpaceList.Add([PSCustomObject]@{Name = $id; RunSpace = $runSpace; Status = $status})
}
while ($runSpaceList.Status.IsCompleted -notcontains $true) {sleep -Milliseconds 10}
$pool.Close()
$pool.Dispose()
$timer.Stop()
$fileList
write-host $timer.Elapsed.TotalSeconds "sec."
总运行时间为 4,8586134 秒。比 PS7 版本慢一点,但仍然比任何 DIR 或 Get-ChildItem 变体快得多。 ;-)
最终解决方案: 最后我能够回答我自己的问题。这是最终代码:
"WinAPI parallel.foreach"
add-type -TypeDefinition @"
using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
public class FileSearch {
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
public static extern bool FindClose(IntPtr hFindFile);
static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
public static class Globals {
public static BlockingCollection<string> resultFileList {get;set;}
}
public static BlockingCollection<string> GetTreeFiles(string path, string searchFile) {
Globals.resultFileList = new BlockingCollection<string>();
List<string> dirList = new List<string>();
searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
GetFiles(path, searchFile);
return Globals.resultFileList;
}
static void GetFiles(string path, string searchFile) {
path = path.EndsWith(@"\") ? path : path + @"\";
List<string> dirList = new List<string>();
WIN32_FIND_DATA fileData;
IntPtr handle = INVALID_HANDLE_VALUE;
handle = FindFirstFile(path + @"*", out fileData);
if (handle != INVALID_HANDLE_VALUE) {
FindNextFile(handle, out fileData);
while (FindNextFile(handle, out fileData)) {
if ((fileData.dwFileAttributes & 0x10) > 0) {
string fullPath = path + fileData.cFileName;
dirList.Add(fullPath);
} else {
if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
string fullPath = path + fileData.cFileName;
Globals.resultFileList.TryAdd(fullPath);
}
}
}
FindClose(handle);
Parallel.ForEach(dirList, (dir) => {
GetFiles(dir, searchFile);
});
}
}
}
"@
[fileSearch]::GetTreeFiles($searchDir, 'hosts')
并且最终运行时间现在比 robocopy 快 3,2536388 秒。 我还在解决方案中添加了该代码的优化版本。
tl;dr:
这个答案没有尝试解决所问的并行问题,但是:
- 单个递归
[IO.Directory]::GetFiles()
调用可能足够快,但请注意,如果涉及不可访问的目录,这只是 PowerShell [Core] v6.2+ 中的一个选项:
# PowerShell [Core] v6.2+
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
- 从实用的角度来说(除了编码练习之外),调用
robocopy
是一种完全合法的方法 - 假设您只需要 运行 Windows - 就像(注意con
是未使用的 target-directory 参数的虚拟参数)一样简单:
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
前面几点:
-
but calling ROBOCOPY is a bit of cheating.
- 可以说,使用 .NET APIs / WinAPI 调用与调用 RoboCopy 等外部实用程序(例如
robocopy.exe /l ...
)一样具有欺骗性。毕竟,调用外部程序是任何 shell 的核心任务,包括 PowerShell(而且System.Diagnostics.Process
nor its PowerShell wrapper,Start-Process
都不需要这样做)。 也就是说,虽然在这种情况下不是问题,但当您调用外部程序时,您确实失去了传递和接收 对象 的能力,并且 in-process 操作通常更快。
- 可以说,使用 .NET APIs / WinAPI 调用与调用 RoboCopy 等外部实用程序(例如
为了定时执行命令(测量性能),PowerShell 提供了一个 high-level 包装
System.Diagnostics.Stopwatch
: theMeasure-Command
cmdlet。这样的性能测量值会波动,因为 PowerShell 作为一种动态解析的语言,会使用大量缓存,这些缓存在首次填充时会产生开销,而且您通常不知道什么时候会发生这种情况 - 请参阅this GitHub issue 了解背景信息。
此外,一个遍历文件系统的long-running命令同时受到其他进程运行ning的干扰,是否file-system 之前的信息已经被缓存了 运行 差别很大.
以下比较使用 higher-level 包装
Measure-Object
,Time-Command
function,这使得比较多个命令的相对 运行 时间性能简单。
加速 PowerShell 代码的关键是最小化实际的 PowerShell 代码,并尽可能多地将工作卸载给 .NET 方法调用/(编译的)外部程序。
以下对比表现:
Get-ChildItem
(只是为了对比,我们知道它太慢了)robocopy.exe
对
System.IO.Directory.GetFiles()
的单个递归调用,可能 对您的目的足够快,尽管 单-线程.- 注意:下面的调用使用仅在 .NET Core 2.1+ 中可用的功能,因此在 PowerShell 中有效[核心] 仅限 v6.2+。 此 API 的 .NET Framework 版本不允许忽略 无法访问的 目录(由于缺少权限),如果遇到此类目录,则会导致枚举失败。
$searchDir = 'C:\' #'# dummy comment to fix syntax highlighting
$searchFile = 'hosts'
# Define the commands to compare as an array of script blocks.
$cmds =
{
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
},
{
(Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
},
{
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
}
Write-Verbose -vb "Warming up the cache..."
# Run one of the commands up front to level the playing field
# with respect to cached filesystem information.
$null = & $cmds[-1]
# Run the commands and compare their timings.
Time-Command $cmds -Count 1 -OutputToHost -vb
在我的 2 核 Windows 10 VM 运行ning PowerShell Core 7.1.0-preview.7 上,我得到以下结果;这些数字因许多因素(不仅仅是文件数量)而异,但应该提供相对性能的一般意义(第 Factor
列)。
请注意,由于 file-system 缓存是有意预先预热的,因此与没有缓存信息的 运行 相比,给定机器的数字将过于乐观。
如您所见,在这种情况下,PowerShell [Core] [System.IO.Directory]::GetFiles()
调用实际上优于 multi-threaded robocopy
调用。
VERBOSE: Warming up the cache...
VERBOSE: Starting 1 run(s) of:
[IO.Directory]::GetFiles(
$searchDir,
$searchFile,
[IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
VERBOSE: Starting 1 run(s) of:
(Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
...
C:\Program Files\Git\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
VERBOSE: Starting 1 run(s) of:
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
VERBOSE: Overall time elapsed: 00:01:48.7731236
Factor Secs (1-run avg.) Command
------ ----------------- -------
1.00 22.500 [IO.Directory]::GetFiles(…
1.14 25.602 (robocopy /l $searchDir NUL $searchFile /s /mt /njh /njs /ns /nc /np).Trim() -ne ''
2.69 60.623 (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
这是我创建的最终代码。运行时间现在是 2,8627695 秒。 将并行性限制为逻辑核心的数量比对所有子目录执行 Parallel.ForEach 提供更好的性能。
您可以 return 将每次命中的完整 FileInfo-Object 放入生成的 BlockingCollection 中,而不是仅 return 文件名。
# powershell-sample to find all "hosts"-files on Partition "c:\"
cls
Remove-Variable * -ea 0
[System.GC]::Collect()
$ErrorActionPreference = "stop"
$searchDir = "c:\"
$searchFile = "hosts"
add-type -TypeDefinition @"
using System;
using System.IO;
using System.Linq;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
public class FileSearch {
public struct WIN32_FIND_DATA {
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
static extern bool FindClose(IntPtr hFindFile);
static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
static BlockingCollection<string> dirList {get;set;}
static BlockingCollection<string> fileList {get;set;}
public static BlockingCollection<string> GetFiles(string searchDir, string searchFile) {
bool isPattern = false;
if (searchFile.Contains(@"?") | searchFile.Contains(@"*")) {
searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
isPattern = true;
}
fileList = new BlockingCollection<string>();
dirList = new BlockingCollection<string>();
dirList.Add(searchDir);
int[] threads = Enumerable.Range(1,Environment.ProcessorCount).ToArray();
Parallel.ForEach(threads, (id) => {
string path;
IntPtr handle = INVALID_HANDLE_VALUE;
WIN32_FIND_DATA fileData;
if (dirList.TryTake(out path, 100)) {
do {
path = path.EndsWith(@"\") ? path : path + @"\";
handle = FindFirstFile(path + @"*", out fileData);
if (handle != INVALID_HANDLE_VALUE) {
FindNextFile(handle, out fileData);
while (FindNextFile(handle, out fileData)) {
if ((fileData.dwFileAttributes & 0x10) > 0) {
string fullPath = path + fileData.cFileName;
dirList.TryAdd(fullPath);
} else {
if (isPattern) {
if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
string fullPath = path + fileData.cFileName;
fileList.TryAdd(fullPath);
}
} else {
if (fileData.cFileName == searchFile) {
string fullPath = path + fileData.cFileName;
fileList.TryAdd(fullPath);
}
}
}
}
FindClose(handle);
}
} while (dirList.TryTake(out path));
}
});
return fileList;
}
}
"@
$fileList = [fileSearch]::GetFiles($searchDir, $searchFile)
$fileList