如何合并记录匹配 1 列的两个 csv 文件中的所有内容

How to merge all contents in two csv files where records match off 1 column

我有两个 csv 文件。他们都有 SamAccountName 的共同点。用户记录可能会或可能不会为两个文件之间的每条记录找到匹配项(这一点非常重要)。

我试图基本上将所有列(及其值)合并到一个文件中(基于在第一个文件中找到的 SamAccountNames...)。

如果在第二个文件中找不到 SamAccountName,它应该在合并文件中添加该用户记录的所有空值(因为在第一个文件中找到了该记录)。

如果在第二个文件中找到 SamAccountName,但在第一个文件中找不到,则应该忽略合并该记录。

每个文件中的列数可能不同(5、10、2 等等...)。

Function MergeTwoCsvFiles
{
    Param ([String]$baseFile, [String]$fileToBeMerged, [String]$columnTitleLineInFileToBeMerged)
    
    $baseFileCsvContents = Import-Csv $baseFile
    $fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
    
    $baseFileContents = Get-Content $baseFile
    
    $baseFileContents[0] += "," + $columnTitleLineInFileToBeMerged
    
    $baseFileCsvContents | ForEach-Object {
        $matchFound = $False
        $baseSameAccountName = $_.SamAccountName
        [String]$mergedLineInFile = $_
        
        [String]$lineMatchFound = $fileToBeMergedCsvContents | Where-Object {$_.SamAccountName -eq $baseSameAccountName}
        Write-Host '$mergedLineInFile =' $mergedLineInFile
        Write-Host '$lineMatchFound =' $lineMatchFound
        Exit
    }
}

问题是,文件中的记录被写为散列 table 而不是类似行的字符串(如果您将其视为 .txt)。所以我不太确定该怎么做...

正在添加结果 csv 示例文件...

第一个 CSV 文件

"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"

第二个 CSV 文件

"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"

预期的合并 CSV 文件

"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""

注意:这将是合并多个文件的循环过程的一部分,因此我想避免对标题名称进行硬编码($_.SamAccountName 作为例外)

尝试来自“不安分的 1987”的建议(无效)

$baseFileCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMergedCsvContents = Import-Csv 'D:\Scripts\Powershell\Tests\lookup.csv'
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()

$baseFileContents = Get-Content 'D:\Scripts\Powershell\Tests\base.csv'

$recordsMatched = compare-object $baseFileCsvContents $fileToBeMergedCsvContents -Property SamAccountName

switch ($recordsMatched)
{
    '<=' {}
    '=>' {}
    '==' {$resultsFileContents += $_}
}

$resultsFileCsv = $resultsFileContents | ConvertTo-Csv
$resultsFileCsv | Export-Csv $resultsFile -NoTypeInformation -Force

输出给出一个空白文件:(

您可以使用 compare-object 来达到这个目的。使用 -property samaccountname 。例如:

$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}

当你的输出变量中有所有数据时,将它放在 convertto-csv 并将其写入文件

经过一整天,我终于想出了一个可行的方法...

...

编辑

原因:在合并具有数千条记录的文件时,打破内部循环并从数组中删除找到的元素会快得多...

Function GetTitlesFromFileToBeMerged
{
    Param ($csvFile)

    [String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1

    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()

    [String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)

    Write-Output $listOfColumnTitles
}

$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()

[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles

[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
    $lineMatchNotFound += ",`"`""
}

$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
    $baseSameAccountName = $_.SamAccountName
    [String]$baseLineInFile = $baseFileContents[$baseLineCounter]

    $lineMatchCounter = 1
    $lineMatchFound = ""
    :inner
    ForEach ($line in $fileToBeMergedContents) {
        If ($line -like "*$baseSameAccountName*") {
            [String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
            $fileToBeMergedContents.RemoveAt($lineMatchCounter)
            break inner
        }; $lineMatchCounter++
    }

    If (!($lineMatchFound))
    {
        [String]$lineMatchFound = $lineMatchNotFound
    }

    $mergedLine = $baseLineInFile + $lineMatchFound
    $resultsFileContents += $mergedLine
    $baseLineCounter++
}

ForEach ($line in $resultsFileContents)
{
    Write-Host $line
}

$resultsFileContents | Set-Content $resultsFile -Force

我很确定这不是最好的方法,还有更好的方法可以更快地处理这个问题。如果有人有任何想法,我对他们持开放态度。谢谢

下面的代码会根据您提供的输入输出所需的结果。

function CombineSkip1($s1, $s2){
    $s3 = $s1 -split ',' 
    $s2 -split ',' | select -Skip 1 | % {$s3 += $_}
    $s4 = $s3 -join ', '

    $s4
}

Write-Output "------Combine files------"

# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv

# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName

# generate final, combined output
$combined = @()
$combined += CombineSkip1 $c1[0] $c2[0]

$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)

for ($i = 1; $i -lt $c1.Count; $i++){
    $user = $c1[$i].Split(',')[0]
    $u2 = $c2 | where {([string]$_).StartsWith($user)}
    if ($u2)
    {
        $combined += CombineSkip1 $c1[$i] $u2
    }
    else
    {
        $combined += ($c1[$i] + $filler)
    }
}

# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force