使用 powershell 合并行并将内容从一个 .csv 拆分为多个文件
Merge rows and split content from one .csv to multiple files using powershell
如 中所述,我希望为给定数据提供第二种输出类型:
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR
CD; 456789; 22.24; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR
CD; 354345; 85.45; Text; SW;
CD; 123556; 94.63; Text; SW;
CD; 354564; 12.34; Text; SW;
CD; 135344; 32.23; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR
CD; 354564; 12.34; Text; SW;
CD; 852143; 34.97; Text; SW;
这次 AB
行应始终位于 CD
行之前。我知道这是多余的,但它会使每一行成为一整套数据。
期望的结果是:
BC987654321.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR; 12345; CD; 456789; 22.24; Text; SW;
BC987654322.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354345; 85.45; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 123556; 94.63; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 135344; 32.23; Text; SW;
BC987654323.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 852143; 34.97; Text; SW;
提前致谢
为此我们需要更有创意并使用临时哈希表。
像这样:
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut = $null # will get a value in the loop
$splitValue = 'AB' # the header1 value that decides to start a new file
$csv = Import-Csv -Path $fileIn -Delimiter ';'
# get an array of the column headers
$allHeaders = $csv[0].PsObject.Properties.Name
## create a new variable containing
$hash = [ordered]@{}
foreach ($item in $csv) {
if ($item.header1 -eq $splitValue) {
# start a new row (build a new hash)
$hash.Clear()
$item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object { $hash[$_.Name] = $_.Value }
# get the filename from header6
$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.header6)
# if a file with that name already exists, delete it
if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
}
elseif ($hash.Count) {
# copy the hash which holds the beginning of the line to a temporary row hash (the 'AB' line)
$rowHash = [ordered]@{}
foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }
$headerIndex = $hash.Count
# append the new fields from this line to the row hash
$item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object {
# for safety: test if we do not index out of the $allHeaders array
$header = if ($headerIndex -lt $allHeaders.Count) { $allHeaders[$headerIndex] } else { "header$($headerIndex + 1)" }
$rowHash[$header] = $_.Value
$headerIndex++ # increment the counter
}
# append trailing headers with empty value
while ($headerIndex -lt $allHeaders.Count) {
$rowHash[$allHeaders[$headerIndex++]] = $null
}
# cast the finalized rowhash into a [PsCustomObject]
$newRow = [PsCustomObject]$rowHash
# write the completed row in the csv file
##$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $newRow.header6)
# if the file already exists, we append, otherwise we create a new file
$append = Test-Path -Path $fileOut -PathType Leaf
$newRow | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append:$append
}
else {
Write-Warning "Could not find a starting row (header1 = '$splitValue') for the file"
}
}
输出:
BC987654321.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654321";"EUR";"CD";"456789";"22.24";"Text";"SW";
BC987654322.csv.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354345";"85.45";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"123556";"94.63";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"135344";"32.23";"Text";"SW";
BC987654323.csv.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"852143";"34.97";"Text";"SW";
编辑
以上内容适用于问题中给出的示例数据,但在很大程度上依赖于重要字段不能为空这一事实。
正如您所评论的,真正的 csv 确实有空字段,因此,代码将数据转移到发生这种情况的错误列中。
使用您的真实数据,这应该会做得更好:
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut = $null # will get a value in the loop
$splitValue = 'IH' # the value in the first column ($idColumn) that decides to start a new file. (in example data 'AB')
$csv = Import-Csv -Path $fileIn -Delimiter ';'
# get an array of all the column headers
$allHeaders = $csv[0].PsObject.Properties.Name # a string array of all header names
# get the index of the first column to start appending from ("Identifier")
$idColumn = $allHeaders[0] # --> 'Record Identifier' (in example data 'header1')
$mergeIndex = [array]::IndexOf($allHeaders, "Identifier") # this is Case-Sensitive !
# if you want to do this case-insensitive, you need to do something like
# $mergeIndex = [array]::IndexOf((($allHeaders -join ';').ToLowerInvariant() -split ';'), "identifier")
# create an ordered hash that will contain the values up to column no. $mergeIndex
$hash = [ordered]@{}
foreach ($item in $csv) {
if ($item.$idColumn -eq $splitValue) {
# start a new row (build a new hash)
$hash.Clear()
for ($i = 0; $i -lt $mergeIndex; $i++) {
$hash[$allHeaders[$i]] = $item.$($allHeaders[$i]) # we need $(..) because of the spaces in the header names
}
# get the filename from the 6th header $item.$($allHeaders[5]) --> 'VAT Number'
$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.'VAT Number')
# if a file with that name already exists, delete it
if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
}
elseif ($hash.Count) {
# create a new ordered hashtable to build the entire line with
$rowHash = [ordered]@{}
# copy the hash which holds the beginning of the line to a temporary row hash (the 'IH' line)
# an ordered hashtable does not have a .Clone() method unfortunately..
foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }
# append the fields from this item to the row hash starting at the $mergeIndex column
$j = 0
for ($i = $mergeIndex; $i -lt $allHeaders.Count; $i++) {
$rowHash[$allHeaders[$i]] = $item.PsObject.Properties.Value[$j++]
}
# cast the finalized rowhash into a [PsCustomObject] and add to the file
[PsCustomObject]$rowHash | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append
}
else {
Write-Warning "Could not find a starting row ('$idColumn' = '$splitValue') for the file"
}
}
注意我没有在这里显示输出,因为可能真正的 csv 显示敏感数据
如
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR
CD; 456789; 22.24; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR
CD; 354345; 85.45; Text; SW;
CD; 123556; 94.63; Text; SW;
CD; 354564; 12.34; Text; SW;
CD; 135344; 32.23; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR
CD; 354564; 12.34; Text; SW;
CD; 852143; 34.97; Text; SW;
这次 AB
行应始终位于 CD
行之前。我知道这是多余的,但它会使每一行成为一整套数据。
期望的结果是:
BC987654321.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR; 12345; CD; 456789; 22.24; Text; SW;
BC987654322.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354345; 85.45; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 123556; 94.63; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 135344; 32.23; Text; SW;
BC987654323.csv
header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 852143; 34.97; Text; SW;
提前致谢
为此我们需要更有创意并使用临时哈希表。
像这样:
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut = $null # will get a value in the loop
$splitValue = 'AB' # the header1 value that decides to start a new file
$csv = Import-Csv -Path $fileIn -Delimiter ';'
# get an array of the column headers
$allHeaders = $csv[0].PsObject.Properties.Name
## create a new variable containing
$hash = [ordered]@{}
foreach ($item in $csv) {
if ($item.header1 -eq $splitValue) {
# start a new row (build a new hash)
$hash.Clear()
$item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object { $hash[$_.Name] = $_.Value }
# get the filename from header6
$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.header6)
# if a file with that name already exists, delete it
if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
}
elseif ($hash.Count) {
# copy the hash which holds the beginning of the line to a temporary row hash (the 'AB' line)
$rowHash = [ordered]@{}
foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }
$headerIndex = $hash.Count
# append the new fields from this line to the row hash
$item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object {
# for safety: test if we do not index out of the $allHeaders array
$header = if ($headerIndex -lt $allHeaders.Count) { $allHeaders[$headerIndex] } else { "header$($headerIndex + 1)" }
$rowHash[$header] = $_.Value
$headerIndex++ # increment the counter
}
# append trailing headers with empty value
while ($headerIndex -lt $allHeaders.Count) {
$rowHash[$allHeaders[$headerIndex++]] = $null
}
# cast the finalized rowhash into a [PsCustomObject]
$newRow = [PsCustomObject]$rowHash
# write the completed row in the csv file
##$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $newRow.header6)
# if the file already exists, we append, otherwise we create a new file
$append = Test-Path -Path $fileOut -PathType Leaf
$newRow | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append:$append
}
else {
Write-Warning "Could not find a starting row (header1 = '$splitValue') for the file"
}
}
输出:
BC987654321.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654321";"EUR";"CD";"456789";"22.24";"Text";"SW";
BC987654322.csv.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354345";"85.45";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"123556";"94.63";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"135344";"32.23";"Text";"SW";
BC987654323.csv.csv
"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"852143";"34.97";"Text";"SW";
编辑
以上内容适用于问题中给出的示例数据,但在很大程度上依赖于重要字段不能为空这一事实。
正如您所评论的,真正的 csv 确实有空字段,因此,代码将数据转移到发生这种情况的错误列中。
使用您的真实数据,这应该会做得更好:
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut = $null # will get a value in the loop
$splitValue = 'IH' # the value in the first column ($idColumn) that decides to start a new file. (in example data 'AB')
$csv = Import-Csv -Path $fileIn -Delimiter ';'
# get an array of all the column headers
$allHeaders = $csv[0].PsObject.Properties.Name # a string array of all header names
# get the index of the first column to start appending from ("Identifier")
$idColumn = $allHeaders[0] # --> 'Record Identifier' (in example data 'header1')
$mergeIndex = [array]::IndexOf($allHeaders, "Identifier") # this is Case-Sensitive !
# if you want to do this case-insensitive, you need to do something like
# $mergeIndex = [array]::IndexOf((($allHeaders -join ';').ToLowerInvariant() -split ';'), "identifier")
# create an ordered hash that will contain the values up to column no. $mergeIndex
$hash = [ordered]@{}
foreach ($item in $csv) {
if ($item.$idColumn -eq $splitValue) {
# start a new row (build a new hash)
$hash.Clear()
for ($i = 0; $i -lt $mergeIndex; $i++) {
$hash[$allHeaders[$i]] = $item.$($allHeaders[$i]) # we need $(..) because of the spaces in the header names
}
# get the filename from the 6th header $item.$($allHeaders[5]) --> 'VAT Number'
$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.'VAT Number')
# if a file with that name already exists, delete it
if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
}
elseif ($hash.Count) {
# create a new ordered hashtable to build the entire line with
$rowHash = [ordered]@{}
# copy the hash which holds the beginning of the line to a temporary row hash (the 'IH' line)
# an ordered hashtable does not have a .Clone() method unfortunately..
foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }
# append the fields from this item to the row hash starting at the $mergeIndex column
$j = 0
for ($i = $mergeIndex; $i -lt $allHeaders.Count; $i++) {
$rowHash[$allHeaders[$i]] = $item.PsObject.Properties.Value[$j++]
}
# cast the finalized rowhash into a [PsCustomObject] and add to the file
[PsCustomObject]$rowHash | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append
}
else {
Write-Warning "Could not find a starting row ('$idColumn' = '$splitValue') for the file"
}
}
注意我没有在这里显示输出,因为可能真正的 csv 显示敏感数据