PowerShell 7.0 如何计算分块读取的大文件的哈希和
PowerShell 7.0 how to compute hashsum of a big file read in chunks
脚本应该复制文件并计算它们的哈希和。
我的目标是创建将读取文件一次而不是 3 次( read_for_copy + read_for_hash + read_for_another_copy )的函数,以最大限度地减少网络负载。
所以我尝试读取一大块文件然后计算 md5 哈希和并将文件写到几个地方。
文件的大小可能从 100 MB 到 2 TB 不等,甚至更大。此时不需要检查文件身份,只需要计算初始文件的哈希和。
我在计算哈希和方面遇到了困难:
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length ) {
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# I am stuck here
$hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""
}
$stream.Close()
$makenew.Close()
$makenew2.Close()
如何收集数据块来计算整个文件的哈希值?
还有一个额外的问题:是否可以通过并行方式计算散列并写出数据?特别是考虑到 PS 版本 6 不支持 workflow {parallel{}}
?
非常感谢
如果你想手动处理输入缓冲,你需要使用 TransformBlock
/TransformFinalBlock
暴露的方法 $md5
:
while($bytesRead = $stream.Read($buffer, 0, $bufferSize))
{
# Write to file copies
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# Feed next chunk to MD5 CSP
$null = $md5.TransformBlock($buffer, 0 , $bytesRead, $null, 0)
}
# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load
我不完全确定您所说的网络负载是什么意思。如果 源文件 在远程文件共享上,但 新副本 进入本地文件系统,您可以通过简单地复制来最小化网络负载源文件一次,然后使用该副本作为第二个副本和哈希计算的来源:
$ifile = "\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2
# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')
FWIW,将文件流对象直接传递给 $md5.ComputeHash($stream)
可能比手动缓冲输入更快
最终上市
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 1mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length )
{
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
$hash = $md5.TransformBlock($buffer, 0 , $bytesRead, $null , 0)
}
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
$hash
$stream.Flush()
$stream.Close()
$makenew.Flush()
$makenew.Close()
$makenew2.Flush()
$makenew2.Close()
脚本应该复制文件并计算它们的哈希和。 我的目标是创建将读取文件一次而不是 3 次( read_for_copy + read_for_hash + read_for_another_copy )的函数,以最大限度地减少网络负载。 所以我尝试读取一大块文件然后计算 md5 哈希和并将文件写到几个地方。 文件的大小可能从 100 MB 到 2 TB 不等,甚至更大。此时不需要检查文件身份,只需要计算初始文件的哈希和。
我在计算哈希和方面遇到了困难:
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length ) {
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# I am stuck here
$hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""
}
$stream.Close()
$makenew.Close()
$makenew2.Close()
如何收集数据块来计算整个文件的哈希值?
还有一个额外的问题:是否可以通过并行方式计算散列并写出数据?特别是考虑到 PS 版本 6 不支持 workflow {parallel{}}
?
非常感谢
如果你想手动处理输入缓冲,你需要使用 TransformBlock
/TransformFinalBlock
暴露的方法 $md5
:
while($bytesRead = $stream.Read($buffer, 0, $bufferSize))
{
# Write to file copies
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# Feed next chunk to MD5 CSP
$null = $md5.TransformBlock($buffer, 0 , $bytesRead, $null, 0)
}
# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load
我不完全确定您所说的网络负载是什么意思。如果 源文件 在远程文件共享上,但 新副本 进入本地文件系统,您可以通过简单地复制来最小化网络负载源文件一次,然后使用该副本作为第二个副本和哈希计算的来源:
$ifile = "\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2
# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')
FWIW,将文件流对象直接传递给 $md5.ComputeHash($stream)
可能比手动缓冲输入更快
最终上市
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 1mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length )
{
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
$hash = $md5.TransformBlock($buffer, 0 , $bytesRead, $null , 0)
}
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
$hash
$stream.Flush()
$stream.Close()
$makenew.Flush()
$makenew.Close()
$makenew2.Flush()
$makenew2.Close()