Powershell Compare-Object 为每个 SideIndicator 输出单独的文件
Powershell Compare-Object Output Separate Files for each SideIndicator
(这可能是我所缺少的相当简单的东西;但我似乎无法弄清楚,也没有在搜索中找到任何答案)
我需要比较两个具有相同列的CSV文件并输出行差异如下(最终输出为Unicode Text):
- 如果文件 A 中存在行但文件 B 中不存在,则将该行标记为“好”
- 如果文件 B 中存在行但文件 A 中不存在,则将该行标记为“错误”
假设我有以下示例数据:
File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010
File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520
这是我当前的代码(借用 the hash-enabled Compare-Object2 from this site 因为交付的 Compare-Object 太慢 -- 仅供参考,我使用 Get-Content 而不是 Import-Csv 因为它是因为我们比较整行,所以快了 50 倍。MyHeader 变量只是为了保留原始文件的 header 列值)
Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
@{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
@{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;
#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');
#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT" -NoTypeInformation -Delimiter "`t";
#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')
这非常有效(我知道其中一些在最后是多余的;但是嘿:这是我能做的最好的了——但绝对可以随意修复这些部分!)创建一个单一的输出文件好的和坏的 -- 两个 40 万行的文件大约需要 40 秒。
Result File:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
Nicky 5200 20210112 Bad 0
问题是,我现在需要将它们创建为 单独的 文件:一个好的文件,一个坏的文件。所以新需要的输出将是:
ResultFileGood:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
ResultFileBad:
Column1 Column2 Column3 Row Label Placeholder
Nicky 5200 20210112 Bad 0
而且我只知道必须有一种方法可以做到这一点,而不必 运行 比较两次 - 使用 Where-Object 道具或某种循环。我就是想不通;所以我来找专家了。
谢谢
编辑:多亏了 postanote,一个可行的替代方案是只输出合并的文件然后拆分它,这绝对比 运行 将整个比较例程进行两次更快。还是想看看有没有办法不用中间文件直接在对比导出中做;但这绝对是一个可行的选择,也是我目前正在使用的选择
$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad 0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;
继续我的评论。
你在那里发生了很多事情;即一些代理功能等
像你一样混合这些项目,你最终会得到这样的东西......(当然非常简单,并且由于你要展示你的输入,你迫使我们猜测得出一个。)
psEdit -filenames 'D:\temp\book1.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Main0,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
psEdit -filenames 'D:\temp\book3.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
Branch3,jjj,kkk,lll
Branch4,mmm,nnn,ooo
#>
更新:
删除所有以前的东西,因为它们不是你的菜...
;-}
Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') |
Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
所以,至于你最后的评论:
While that method still uses the intermediary file; I admit I
completely wasn't thinking about the simple approach of just exporting
the combined file and then just splitting that.***
好的,那么,不用'intermediary file'。
($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
# Results
<#
InputObject SideIndicator
----------- -------------
Main0,aaa,bbb,ccc <=
Branch3,jjj,kkk,lll =>
Branch4,mmm,nnn,ooo =>
#>
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
然后导出为 csv。
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
根据需要回读
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
更新
根据您的评论 --
'the problem is the final output need: the Unicode Tab-delimited text
with the additional columns.'
(($ComparedObjects -match '<=').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\ReferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
(($ComparedObjects -match '=>').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\DifferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
或者对于额外的列内容,您可以这样做...
$ComparedObjects -match '<=' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Main0 aaa bbb ccc Good 0
#>
$ComparedObjects -match '=>' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Branch3 jjj kkk lll Good 0
Branch4 mmm nnn ooo Good 0
#>
(这可能是我所缺少的相当简单的东西;但我似乎无法弄清楚,也没有在搜索中找到任何答案)
我需要比较两个具有相同列的CSV文件并输出行差异如下(最终输出为Unicode Text):
- 如果文件 A 中存在行但文件 B 中不存在,则将该行标记为“好”
- 如果文件 B 中存在行但文件 A 中不存在,则将该行标记为“错误”
假设我有以下示例数据:
File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010
File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520
这是我当前的代码(借用 the hash-enabled Compare-Object2 from this site 因为交付的 Compare-Object 太慢 -- 仅供参考,我使用 Get-Content 而不是 Import-Csv 因为它是因为我们比较整行,所以快了 50 倍。MyHeader 变量只是为了保留原始文件的 header 列值)
Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
@{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
@{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;
#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');
#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT" -NoTypeInformation -Delimiter "`t";
#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')
这非常有效(我知道其中一些在最后是多余的;但是嘿:这是我能做的最好的了——但绝对可以随意修复这些部分!)创建一个单一的输出文件好的和坏的 -- 两个 40 万行的文件大约需要 40 秒。
Result File:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
Nicky 5200 20210112 Bad 0
问题是,我现在需要将它们创建为 单独的 文件:一个好的文件,一个坏的文件。所以新需要的输出将是:
ResultFileGood:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
ResultFileBad:
Column1 Column2 Column3 Row Label Placeholder
Nicky 5200 20210112 Bad 0
而且我只知道必须有一种方法可以做到这一点,而不必 运行 比较两次 - 使用 Where-Object 道具或某种循环。我就是想不通;所以我来找专家了。
谢谢
编辑:多亏了 postanote,一个可行的替代方案是只输出合并的文件然后拆分它,这绝对比 运行 将整个比较例程进行两次更快。还是想看看有没有办法不用中间文件直接在对比导出中做;但这绝对是一个可行的选择,也是我目前正在使用的选择
$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad 0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;
继续我的评论。
你在那里发生了很多事情;即一些代理功能等
像你一样混合这些项目,你最终会得到这样的东西......(当然非常简单,并且由于你要展示你的输入,你迫使我们猜测得出一个。)
psEdit -filenames 'D:\temp\book1.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Main0,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
psEdit -filenames 'D:\temp\book3.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
Branch3,jjj,kkk,lll
Branch4,mmm,nnn,ooo
#>
更新:
删除所有以前的东西,因为它们不是你的菜...
;-}
Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') |
Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
所以,至于你最后的评论:
While that method still uses the intermediary file; I admit I completely wasn't thinking about the simple approach of just exporting the combined file and then just splitting that.***
好的,那么,不用'intermediary file'。
($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
# Results
<#
InputObject SideIndicator
----------- -------------
Main0,aaa,bbb,ccc <=
Branch3,jjj,kkk,lll =>
Branch4,mmm,nnn,ooo =>
#>
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
然后导出为 csv。
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
根据需要回读
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
更新
根据您的评论 --
'the problem is the final output need: the Unicode Tab-delimited text with the additional columns.'
(($ComparedObjects -match '<=').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\ReferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
(($ComparedObjects -match '=>').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\DifferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
或者对于额外的列内容,您可以这样做...
$ComparedObjects -match '<=' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Main0 aaa bbb ccc Good 0
#>
$ComparedObjects -match '=>' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Branch3 jjj kkk lll Good 0
Branch4 mmm nnn ooo Good 0
#>