如何合并记录匹配 1 列的两个 csv 文件中的所有内容
How to merge all contents in two csv files where records match off 1 column
我有两个 csv 文件。他们都有 SamAccountName
的共同点。用户记录可能会或可能不会为两个文件之间的每条记录找到匹配项(这一点非常重要)。
我试图基本上将所有列(及其值)合并到一个文件中(基于在第一个文件中找到的 SamAccountNames...)。
如果在第二个文件中找不到 SamAccountName,它应该在合并文件中添加该用户记录的所有空值(因为在第一个文件中找到了该记录)。
如果在第二个文件中找到 SamAccountName,但在第一个文件中找不到,则应该忽略合并该记录。
每个文件中的列数可能不同(5、10、2 等等...)。
Function MergeTwoCsvFiles
{
Param ([String]$baseFile, [String]$fileToBeMerged, [String]$columnTitleLineInFileToBeMerged)
$baseFileCsvContents = Import-Csv $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
$baseFileContents = Get-Content $baseFile
$baseFileContents[0] += "," + $columnTitleLineInFileToBeMerged
$baseFileCsvContents | ForEach-Object {
$matchFound = $False
$baseSameAccountName = $_.SamAccountName
[String]$mergedLineInFile = $_
[String]$lineMatchFound = $fileToBeMergedCsvContents | Where-Object {$_.SamAccountName -eq $baseSameAccountName}
Write-Host '$mergedLineInFile =' $mergedLineInFile
Write-Host '$lineMatchFound =' $lineMatchFound
Exit
}
}
问题是,文件中的记录被写为散列 table 而不是类似行的字符串(如果您将其视为 .txt)。所以我不太确定该怎么做...
正在添加结果 csv 示例文件...
第一个 CSV 文件
"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"
第二个 CSV 文件
"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"
预期的合并 CSV 文件
"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""
注意:这将是合并多个文件的循环过程的一部分,因此我想避免对标题名称进行硬编码($_.SamAccountName
作为例外)
尝试来自“不安分的 1987”的建议(无效)
$baseFileCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMergedCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\lookup.csv'
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()
$baseFileContents = Get-Content 'D:\Scripts\Powershell\Tests\base.csv'
$recordsMatched = compare-object $baseFileCsvContents $fileToBeMergedCsvContents -Property SamAccountName
switch ($recordsMatched)
{
'<=' {}
'=>' {}
'==' {$resultsFileContents += $_}
}
$resultsFileCsv = $resultsFileContents | ConvertTo-Csv
$resultsFileCsv | Export-Csv $resultsFile -NoTypeInformation -Force
输出给出一个空白文件:(
您可以使用 compare-object
来达到这个目的。使用 -property samaccountname
。例如:
$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}
当你的输出变量中有所有数据时,将它放在 convertto-csv
并将其写入文件
经过一整天,我终于想出了一个可行的方法...
...
编辑
原因:在合并具有数千条记录的文件时,打破内部循环并从数组中删除找到的元素会快得多...
Function GetTitlesFromFileToBeMerged
{
Param ($csvFile)
[String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()
[String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)
Write-Output $listOfColumnTitles
}
$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()
[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles
[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
$lineMatchNotFound += ",`"`""
}
$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
$baseSameAccountName = $_.SamAccountName
[String]$baseLineInFile = $baseFileContents[$baseLineCounter]
$lineMatchCounter = 1
$lineMatchFound = ""
:inner
ForEach ($line in $fileToBeMergedContents) {
If ($line -like "*$baseSameAccountName*") {
[String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
$fileToBeMergedContents.RemoveAt($lineMatchCounter)
break inner
}; $lineMatchCounter++
}
If (!($lineMatchFound))
{
[String]$lineMatchFound = $lineMatchNotFound
}
$mergedLine = $baseLineInFile + $lineMatchFound
$resultsFileContents += $mergedLine
$baseLineCounter++
}
ForEach ($line in $resultsFileContents)
{
Write-Host $line
}
$resultsFileContents | Set-Content $resultsFile -Force
我很确定这不是最好的方法,还有更好的方法可以更快地处理这个问题。如果有人有任何想法,我对他们持开放态度。谢谢
下面的代码会根据您提供的输入输出所需的结果。
function CombineSkip1($s1, $s2){
$s3 = $s1 -split ','
$s2 -split ',' | select -Skip 1 | % {$s3 += $_}
$s4 = $s3 -join ', '
$s4
}
Write-Output "------Combine files------"
# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv
# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName
# generate final, combined output
$combined = @()
$combined += CombineSkip1 $c1[0] $c2[0]
$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)
for ($i = 1; $i -lt $c1.Count; $i++){
$user = $c1[$i].Split(',')[0]
$u2 = $c2 | where {([string]$_).StartsWith($user)}
if ($u2)
{
$combined += CombineSkip1 $c1[$i] $u2
}
else
{
$combined += ($c1[$i] + $filler)
}
}
# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force
我有两个 csv 文件。他们都有 SamAccountName
的共同点。用户记录可能会或可能不会为两个文件之间的每条记录找到匹配项(这一点非常重要)。
我试图基本上将所有列(及其值)合并到一个文件中(基于在第一个文件中找到的 SamAccountNames...)。
如果在第二个文件中找不到 SamAccountName,它应该在合并文件中添加该用户记录的所有空值(因为在第一个文件中找到了该记录)。
如果在第二个文件中找到 SamAccountName,但在第一个文件中找不到,则应该忽略合并该记录。
每个文件中的列数可能不同(5、10、2 等等...)。
Function MergeTwoCsvFiles
{
Param ([String]$baseFile, [String]$fileToBeMerged, [String]$columnTitleLineInFileToBeMerged)
$baseFileCsvContents = Import-Csv $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
$baseFileContents = Get-Content $baseFile
$baseFileContents[0] += "," + $columnTitleLineInFileToBeMerged
$baseFileCsvContents | ForEach-Object {
$matchFound = $False
$baseSameAccountName = $_.SamAccountName
[String]$mergedLineInFile = $_
[String]$lineMatchFound = $fileToBeMergedCsvContents | Where-Object {$_.SamAccountName -eq $baseSameAccountName}
Write-Host '$mergedLineInFile =' $mergedLineInFile
Write-Host '$lineMatchFound =' $lineMatchFound
Exit
}
}
问题是,文件中的记录被写为散列 table 而不是类似行的字符串(如果您将其视为 .txt)。所以我不太确定该怎么做...
正在添加结果 csv 示例文件...
第一个 CSV 文件
"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"
第二个 CSV 文件
"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"
预期的合并 CSV 文件
"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""
注意:这将是合并多个文件的循环过程的一部分,因此我想避免对标题名称进行硬编码($_.SamAccountName
作为例外)
尝试来自“不安分的 1987”的建议(无效)
$baseFileCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMergedCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\lookup.csv'
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()
$baseFileContents = Get-Content 'D:\Scripts\Powershell\Tests\base.csv'
$recordsMatched = compare-object $baseFileCsvContents $fileToBeMergedCsvContents -Property SamAccountName
switch ($recordsMatched)
{
'<=' {}
'=>' {}
'==' {$resultsFileContents += $_}
}
$resultsFileCsv = $resultsFileContents | ConvertTo-Csv
$resultsFileCsv | Export-Csv $resultsFile -NoTypeInformation -Force
输出给出一个空白文件:(
您可以使用 compare-object
来达到这个目的。使用 -property samaccountname
。例如:
$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}
当你的输出变量中有所有数据时,将它放在 convertto-csv
并将其写入文件
经过一整天,我终于想出了一个可行的方法...
...
编辑
原因:在合并具有数千条记录的文件时,打破内部循环并从数组中删除找到的元素会快得多...
Function GetTitlesFromFileToBeMerged
{
Param ($csvFile)
[String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
[String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()
[String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)
Write-Output $listOfColumnTitles
}
$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()
[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles
[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
$lineMatchNotFound += ",`"`""
}
$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
$baseSameAccountName = $_.SamAccountName
[String]$baseLineInFile = $baseFileContents[$baseLineCounter]
$lineMatchCounter = 1
$lineMatchFound = ""
:inner
ForEach ($line in $fileToBeMergedContents) {
If ($line -like "*$baseSameAccountName*") {
[String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
$fileToBeMergedContents.RemoveAt($lineMatchCounter)
break inner
}; $lineMatchCounter++
}
If (!($lineMatchFound))
{
[String]$lineMatchFound = $lineMatchNotFound
}
$mergedLine = $baseLineInFile + $lineMatchFound
$resultsFileContents += $mergedLine
$baseLineCounter++
}
ForEach ($line in $resultsFileContents)
{
Write-Host $line
}
$resultsFileContents | Set-Content $resultsFile -Force
我很确定这不是最好的方法,还有更好的方法可以更快地处理这个问题。如果有人有任何想法,我对他们持开放态度。谢谢
下面的代码会根据您提供的输入输出所需的结果。
function CombineSkip1($s1, $s2){
$s3 = $s1 -split ','
$s2 -split ',' | select -Skip 1 | % {$s3 += $_}
$s4 = $s3 -join ', '
$s4
}
Write-Output "------Combine files------"
# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv
# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName
# generate final, combined output
$combined = @()
$combined += CombineSkip1 $c1[0] $c2[0]
$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)
for ($i = 1; $i -lt $c1.Count; $i++){
$user = $c1[$i].Split(',')[0]
$u2 = $c2 | where {([string]$_).StartsWith($user)}
if ($u2)
{
$combined += CombineSkip1 $c1[$i] $u2
}
else
{
$combined += ($c1[$i] + $filler)
}
}
# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force