使用多个重复分隔符拆分 CSV 文件中的列值

Split column value in CSV file using multiple and duplicate delimiters

我有一个乱七八糟的 CSV 文件。我正在尝试使用正则表达式从 csv 文件的列中的值中提取名字和姓氏。名字和姓氏将有自己的列。

CSV 文件(包含不同的分隔符组合):

ID,Description,Number
JDo,John Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe  - Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops  ,SomeValue
JDo,John Doe-Temp-Client Client Ops,SomeValue
JDo,John  Doe - Temp-Client Client Ops,SomeValue
JDo,John  Doe  - Temp-Client Client Ops,SomeValue
JDo,John  Doe-Temp -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe-Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe-Temp-Client  Client  Ops  ,SomeValue
JDo,John.Doe - Temp - Client Client Ops,SomeValue
JDo,John .Doe - Temp - Client Client Ops,SomeValue
JDo,John. Doe - Temp - Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John.Doe - Temp - Client Client Ops  ,SomeValue
JDo,John .Doe - Temp - Client Client Ops  ,SomeValue
JDo,John. Doe - Temp - Client Client Ops  ,SomeValue
JDo,John . Doe - Temp - Client Client Ops  ,SomeValue
JDo,John.Doe-Temp-Client Client Ops,SomeValue
JDo,John .Doe-Temp-Client Client Ops,SomeValue
JDo,John. Doe-Temp-Client Client Ops,SomeValue
JDo,John . Doe-Temp-Client Client Ops,SomeValue
JDo,John.Doe  - Temp  - Client Client Ops,SomeValue
JDo,John .Doe -  Temp -  Client Client Ops,SomeValue
JDo,John. Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops,SomeValue
JDo,John ?Doe - Temp - Client Client Ops,SomeValue
JDo,John? Doe - Temp - Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John?Doe-Temp-Client Client Ops,SomeValue
JDo,John ?Doe-Temp-Client Client Ops,SomeValue
JDo,John? Doe-Temp-Client Client Ops,SomeValue
JDo,John ? Doe-Temp-Client Client Ops,SomeValue
JDo,John?Doe  - Temp  - Client Client Ops,SomeValue
JDo,John ?Doe -  Temp -  Client Client Ops,SomeValue
JDo,John? Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,"John,Doe - Temp - Client Client Ops",SomeValue
JDo,"John ,Doe - Temp - Client Client Ops",SomeValue
JDo,"John, Doe - Temp - Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,"  John,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John ,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John, Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John , Doe - Temp - Client Client Ops  ",SomeValue
JDo,"John,Doe-Temp-Client Client Ops",SomeValue
JDo,"John ,Doe-Temp-Client Client Ops",SomeValue
JDo,"John, Doe-Temp-Client Client Ops",SomeValue
JDo,"John , Doe-Temp-Client Client Ops",SomeValue
JDo,"John,Doe  - Temp  - Client Client Ops",SomeValue
JDo,"John ,Doe -  Temp -  Client Client Ops",SomeValue
JDo,"John, Doe  -  Temp  -  Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,John-Doe - Temp - Client Client Ops,SomeValue
JDo,John -Doe - Temp - Client Client Ops,SomeValue
JDo,John- Doe - Temp - Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue
JDo,John-Doe - Temp - Client Client Ops  ,SomeValue
JDo,John -Doe - Temp - Client Client Ops  ,SomeValue
JDo,John- Doe - Temp - Client Client Ops  ,SomeValue
JDo,John - Doe - Temp - Client Client Ops  ,SomeValue
JDo,John-Doe-Temp-Client Client Ops,SomeValue
JDo,John -Doe-Temp-Client Client Ops,SomeValue
JDo,John- Doe-Temp-Client Client Ops,SomeValue
JDo,John - Doe-Temp-Client Client Ops,SomeValue
JDo,John-Doe  - Temp  - Client Client Ops,SomeValue
JDo,John -Doe -  Temp -  Client Client Ops,SomeValue
JDo,John- Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue

要添加名字和姓氏列,我使用以下代码:

Function FixRxClaimReportAddFirstLastNameColumn {
  Param ($csvFile)

  Write-Host "Adding columns 'First Name' and 'Last Name' to $csvFile"
  Import-Csv $csvFile |
    Select-Object *, @{n='First Name'; e={if ($_.Description) {
        $columnFirstNameValue = $($_.Description -replace '\s+', ' ').split(" ")[0]
        if ($columnFirstNameValue -notlike "*,*" -and $columnFirstNameValue -notmatch '\?' -and $columnFirstNameValue -notlike "*.*" -and $columnFirstNameValue -notlike "*-*") {
          $columnFirstNameValue.Trim()
        } else {
          $columnFirstNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnFirstNameValue2[0].Trim()
        }
      }}}, @{n='Last Name'; e={if ($_.Description) {
        $columnLastNameValue = $($_.Description -replace '\s+', ' ').split(" ")[1]
        if ($columnLastNameValue -notlike "*,*" -and $columnLastNameValue -notmatch '\?' -and $columnLastNameValue -notlike "*.*" -and $columnLastNameValue -notlike "*-*") {
          $columnLastNameValue.Trim()
        } else {
          $columnLastNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnLastNameValue2[1].Trim()
        }
      }}} | Export-Csv "$csvFile-Results.csv" -NoTypeInformation -Force
  Write-Host "Complete."
  Write-Host ""
}

FixRxClaimReportAddFirstLastNameColumn 'C:\Scripts\Tests\Test1.csv'

当我 运行 这段代码时,所有名字值都应该是 John,所有姓氏值都应该是 Doe。但是,每个人的价值观都大不相同。

你想的太复杂了。从 Description 字段末尾删除附加信息以仅获取名称,然后 trim 名称并将其拆分为名字和姓氏,然后再将这些作为新属性添加到输入对象。

试试这个:

Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
  $rawname = $_.Description -replace '-[^-]*-[^-]*$'
  $firstname, $lastname = $rawname.Trim() -split ' *[ \?\.,-] *'
  $_ | Add-Member -Type NoteProperty -Name FirstName -Value $firstname
  $_ | Add-Member -Type NoteProperty -Name LastName -Value $lastname
  $_
} | Export-Csv 'C:\path\to\output.csv' -NoType