从 PowerShell 中的管道流读取
Reading from the Pipeline Stream in PowerShell
背景
我希望编写使用 Microsoft.VisualBasic.FileIO.TextFieldParser
解析一些 csv 数据的代码。
我为其生成此数据的系统不理解引号;所以我无法逃脱定界符;而是必须更换它。
我找到了一个使用上述文本解析器的解决方案,但我只看到人们将它用于文件输入。我宁愿将数据保存在内存中/使用此 class 接受流作为输入的构造函数,而不是将我的数据写入文件以再次导入它。
理想情况下,它能够直接从用于管道的任何内存流中获取提要;但我不知道如何访问它。
在我当前的代码中,我创建了自己的内存流并从管道向其提供数据;然后尝试从中读取。不幸的是我遗漏了一些东西。
问题
- 如何在 PowerShell 中读取/写入内存流?
- 是否可以直接从输入函数管道的流中读取?
代码
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
#[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$Line
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
)
begin {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[System.IO.StreamReader]$readStream = New-Object System.IO.StreamReader($memStream)
#[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
#$Parser.SetDelimiters($Delimiter)
#$Parser.HasFieldsEnclosedInQuotes = $true
#$writeStream.AutoFlush = $true
}
process {
$writeStream.WriteLine($_)
#$writeStream.Flush() #maybe we need to flush it before the reader will see it?
write-output $readStream.ReadLine()
#("Line: {0:000}" -f $Parser.LineNumber)
#write-output $Parser.ReadFields()
}
end {
#close streams and dispose (dodgy catch all's in case object's disposed before we call Dispose)
#try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$readStream.Close(); $readStream.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
1,2,3,4 | Clean-CsvStream -$Delimiter ';' #nothing like the real data, but I'm not interested in actual CSV cleansing at this point
解决方法
与此同时,我的解决方案只是在对象的属性而不是 CSV 行上执行此替换。
$cols = $objectArray | Get-Member | ?{$_.MemberType -eq 'NoteProperty'} | select -ExpandProperty name
$objectArray | %{$csvRow =$_; ($cols | %{($csvRow.$_ -replace "[`n,]",':')}) -join ',' }
更新
我意识到缺少的代码是 $memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
然而,这并不完全符合预期;即我的 CSV 的第一行显示两次,其他输出的顺序错误;所以大概我误解了如何使用 Seek
。
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
$writeStream.AutoFlush = $true
}
process {
if ($InvalidCharRegex) {
$writeStream.WriteLine($CsvRow)
#flush here if not auto
$memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
} else { #if we're not replacing anything, keep it simple
$CsvRow
}
}
end {
"end {"
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
"} #end"
}
}
$csv = @(
(new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'
经过大量尝试后,这似乎可行:
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex)
if(-not $IsSimple) {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
}
}
process {
if ($IsSimple) {
$CsvRow
} else { #if we're not replacing anything, keep it simple
[long]$seekStart = $memStream.Seek(0, [System.IO.SeekOrigin]::Current)
$writeStream.WriteLine($CsvRow)
$writeStream.Flush()
$memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin) | out-null
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
}
}
end {
if(-not $IsSimple) {
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
}
$csv = @(
(new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'
即
- 写入前求当前位置
- 然后写
- 然后冲洗(如果不是自动)
- 然后寻找数据的开始
- 然后阅读
- 重复
虽然我不确定这是正确的;因为我找不到任何很好的例子或文档来解释,所以就一直在玩,直到有一些隐约有意义的东西起作用。
如果有人知道如何直接从管道流中读取,我也很感兴趣;即去除奖金流的额外开销。
@M.R.的评论
抱歉来晚了;如果它对其他人有用:
如果行尾分隔符是 CrLf (\r\n
) 而不仅仅是 Cr (\r
) 那么很容易区分 record/line 和换行符之间的歧义在一个字段内:
Get-Content -LiteralPath 'D:\test\file to clean.csv' -Delimiter "`r`n" |
%{$_.ToString().TrimEnd("`r`n")} | #the delimiter is left on the end of the string; remove it
%{('"{0}"' -f $_) -replace '\|','"|"'} | #insert quotes at start and end of line, as well as around delimeters
ConvertFrom-Csv -Delimiter '|' #treat the pipeline content as a valid pipe delimitted csv
但是,如果不是,您将无法分辨哪个 Cr 是记录的结尾,哪个只是文本中的一个中断。您可以通过计算管道的数量来稍微解决这个问题;也就是说,如果您有 5 列,第四个定界符之前的任何 CR 都是换行符而不是记录结尾。但是,如果有另一个换行符,您无法确定这是最后一列数据中的换行符,还是该行的末尾。如果您知道第一列或最后一列不包含换行符(或两者都包含),您可以解决这个问题。对于所有这些更复杂的场景,我怀疑正则表达式是最好的选择;使用 select-string
之类的东西来应用它。如果这是必需的; post 作为此处的问题,提供您的确切要求和您已经尝试过的信息,其他人可以帮助您。
背景
我希望编写使用 Microsoft.VisualBasic.FileIO.TextFieldParser
解析一些 csv 数据的代码。
我为其生成此数据的系统不理解引号;所以我无法逃脱定界符;而是必须更换它。
我找到了一个使用上述文本解析器的解决方案,但我只看到人们将它用于文件输入。我宁愿将数据保存在内存中/使用此 class 接受流作为输入的构造函数,而不是将我的数据写入文件以再次导入它。
理想情况下,它能够直接从用于管道的任何内存流中获取提要;但我不知道如何访问它。 在我当前的代码中,我创建了自己的内存流并从管道向其提供数据;然后尝试从中读取。不幸的是我遗漏了一些东西。
问题
- 如何在 PowerShell 中读取/写入内存流?
- 是否可以直接从输入函数管道的流中读取?
代码
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
#[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$Line
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
)
begin {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[System.IO.StreamReader]$readStream = New-Object System.IO.StreamReader($memStream)
#[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
#$Parser.SetDelimiters($Delimiter)
#$Parser.HasFieldsEnclosedInQuotes = $true
#$writeStream.AutoFlush = $true
}
process {
$writeStream.WriteLine($_)
#$writeStream.Flush() #maybe we need to flush it before the reader will see it?
write-output $readStream.ReadLine()
#("Line: {0:000}" -f $Parser.LineNumber)
#write-output $Parser.ReadFields()
}
end {
#close streams and dispose (dodgy catch all's in case object's disposed before we call Dispose)
#try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$readStream.Close(); $readStream.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
1,2,3,4 | Clean-CsvStream -$Delimiter ';' #nothing like the real data, but I'm not interested in actual CSV cleansing at this point
解决方法
与此同时,我的解决方案只是在对象的属性而不是 CSV 行上执行此替换。
$cols = $objectArray | Get-Member | ?{$_.MemberType -eq 'NoteProperty'} | select -ExpandProperty name
$objectArray | %{$csvRow =$_; ($cols | %{($csvRow.$_ -replace "[`n,]",':')}) -join ',' }
更新
我意识到缺少的代码是 $memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
然而,这并不完全符合预期;即我的 CSV 的第一行显示两次,其他输出的顺序错误;所以大概我误解了如何使用 Seek
。
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
$writeStream.AutoFlush = $true
}
process {
if ($InvalidCharRegex) {
$writeStream.WriteLine($CsvRow)
#flush here if not auto
$memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
} else { #if we're not replacing anything, keep it simple
$CsvRow
}
}
end {
"end {"
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
"} #end"
}
}
$csv = @(
(new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'
经过大量尝试后,这似乎可行:
clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
function Clean-CsvStream {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline=$true)]
[string]$CsvRow
,
[Parameter(Mandatory = $false)]
[char]$Delimiter = ','
,
[Parameter(Mandatory = $false)]
[regex]$InvalidCharRegex
,
[Parameter(Mandatory = $false)]
[string]$ReplacementString
)
begin {
[bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex)
if(-not $IsSimple) {
[System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
[System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
$Parser.SetDelimiters($Delimiter)
$Parser.HasFieldsEnclosedInQuotes = $true
}
}
process {
if ($IsSimple) {
$CsvRow
} else { #if we're not replacing anything, keep it simple
[long]$seekStart = $memStream.Seek(0, [System.IO.SeekOrigin]::Current)
$writeStream.WriteLine($CsvRow)
$writeStream.Flush()
$memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin) | out-null
write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
}
}
end {
if(-not $IsSimple) {
try {$Parser.Close(); $Parser.Dispose()} catch{}
try {$writeStream.Close(); $writeStream.Dispose()} catch{}
try {$memStream.Close(); $memStream.Dispose()} catch{}
}
}
}
$csv = @(
(new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"})
,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':'
即
- 写入前求当前位置
- 然后写
- 然后冲洗(如果不是自动)
- 然后寻找数据的开始
- 然后阅读
- 重复
虽然我不确定这是正确的;因为我找不到任何很好的例子或文档来解释,所以就一直在玩,直到有一些隐约有意义的东西起作用。
如果有人知道如何直接从管道流中读取,我也很感兴趣;即去除奖金流的额外开销。
@M.R.的评论
抱歉来晚了;如果它对其他人有用:
如果行尾分隔符是 CrLf (\r\n
) 而不仅仅是 Cr (\r
) 那么很容易区分 record/line 和换行符之间的歧义在一个字段内:
Get-Content -LiteralPath 'D:\test\file to clean.csv' -Delimiter "`r`n" |
%{$_.ToString().TrimEnd("`r`n")} | #the delimiter is left on the end of the string; remove it
%{('"{0}"' -f $_) -replace '\|','"|"'} | #insert quotes at start and end of line, as well as around delimeters
ConvertFrom-Csv -Delimiter '|' #treat the pipeline content as a valid pipe delimitted csv
但是,如果不是,您将无法分辨哪个 Cr 是记录的结尾,哪个只是文本中的一个中断。您可以通过计算管道的数量来稍微解决这个问题;也就是说,如果您有 5 列,第四个定界符之前的任何 CR 都是换行符而不是记录结尾。但是,如果有另一个换行符,您无法确定这是最后一列数据中的换行符,还是该行的末尾。如果您知道第一列或最后一列不包含换行符(或两者都包含),您可以解决这个问题。对于所有这些更复杂的场景,我怀疑正则表达式是最好的选择;使用 select-string
之类的东西来应用它。如果这是必需的; post 作为此处的问题,提供您的确切要求和您已经尝试过的信息,其他人可以帮助您。