Powershell DataTable 中带逗号的双引号

Double quotes with comma in Powershell DataTable

我在 PowerShell 中有一个脚本,用于从读取 .csv 或 .txt 文件(均以逗号分隔)并将数据表插入数据库的 StreamReader 构建数据表。

我的源文件包含双引号内带逗号的数据,示例:

ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"

问题是我要拆分行的时间:

$datatable = New-Object System.Data.DataTable

$src = "My comma delimited file (.txt/.csv)"
$reader = New-Object IO.StreamReader($src)
$header = Get-Content -Path $src | select -First 1
$columns = $header.Split(",")


foreach ($column in $columns) {
    $datatable.columns.add($column) 
} 

while(($line = $reader.ReadLine()) -ne $null){
    $line = $line -split(",")

由于双引号中的逗号,拆分给我 5 列而不是 3 列。

我不想删除双引号内的逗号。数据是这样插入的:Some text, More Text, Text again

如何解决这个问题?

解决方案 1 - ConvertFrom-CSV: 保存 header 并使用 ConvertFrom-Csv 为您解析它。还没有在大文件上测试过它,但它不必将整个文件加载到内存中,所以它至少应该可以工作。例如:

#Create samplefile
@"
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
5678,"Some text, More Text, Text again and again",Text2
$(1..100000 | % { "$_,`"Some text$_, More Text$_, Text again and again$_`",Text$_`n" })
"@ -split "`n" | % { $_.trim() } | Set-Content D:\Downloads\test.txt


$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object IO.StreamReader($src)

#Get header and split to columns
$columns = $reader.ReadLine() -split ','

foreach ($column in $columns) {
    $datatable.columns.add($column)
}

while(($line = $reader.ReadLine()) -ne $null){
    #Let ConvertFrom-CSV do the heavy-lifting by making it convert one "csv-file" per line using a known header
    $obj = $line | ConvertFrom-Csv -Header $columns

    $row = $datatable.NewRow()
    $row.ID = $obj.ID
    $row.Desc = $obj.Desc
    $row.Obs = $obj.Obs

    $datatable.Rows.Add($row)
}

测试:

#Show available columns
$datatable.Columns.Caption
ID
Desc
Obs

#Show datatable
$datatable

ID   Desc                                       Obs
--   ----                                       ---
1234 Some text, More Text, Text again           Text
5678 Some text, More Text, Text again and again Text2

解决方案 2 - TextFieldParser: VisualBasic-assembly 有一个理解引用字段的 TextFieldParser-class。这将执行得更快(在我的 100k csv-test 中快 50%),因为直接使用 .NET 时开销更少。尝试:

Add-Type -AssemblyName Microsoft.VisualBasic

$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"

$reader = New-Object -TypeName Microsoft.VisualBasic.FileIO.TextFieldParser -ArgumentList $src
$reader.Delimiters = @(",")
#Default values, but wanted to show the options
$reader.HasFieldsEnclosedInQuotes = $true
$reader.TrimWhiteSpace = $true

#Get header as array
$columns = $reader.ReadFields()

foreach ($column in $columns) {
    $datatable.columns.add($column)
}

while($fields = $reader.ReadFields()) {

    $row = $datatable.NewRow()

    #Insert value in property using field(column) index
    for ($i = 0; $i -lt $columns.Count; $i++) {
        $row.($columns[$i]) = $fields[$i]
    }

    $datatable.Rows.Add($row)
}

$reader.Close()

试试这个:

$csv=import-csv "C:\temp\vminfo.csv"

$datatable = New-Object System.Data.DataTable

#Add all columns
$columnsname=$csv | Get-Member -MemberType NoteProperty | %{ $datatable.columns.add($_.Name) }

#Add datas by column name
$csv | %{

    $newrow=$datatable.NewRow()
    $rowcsv=$_

    $columnsname | %{$newrow[$_]=$rowcsv."$_"}

    $datatable.Rows.Add($newrow)

}