Powershell DataTable 中带逗号的双引号
Double quotes with comma in Powershell DataTable
我在 PowerShell 中有一个脚本,用于从读取 .csv 或 .txt 文件(均以逗号分隔)并将数据表插入数据库的 StreamReader 构建数据表。
我的源文件包含双引号内带逗号的数据,示例:
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
问题是我要拆分行的时间:
$datatable = New-Object System.Data.DataTable
$src = "My comma delimited file (.txt/.csv)"
$reader = New-Object IO.StreamReader($src)
$header = Get-Content -Path $src | select -First 1
$columns = $header.Split(",")
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while(($line = $reader.ReadLine()) -ne $null){
$line = $line -split(",")
由于双引号中的逗号,拆分给我 5 列而不是 3 列。
我不想删除双引号内的逗号。数据是这样插入的:Some text, More Text, Text again
如何解决这个问题?
解决方案 1 - ConvertFrom-CSV:
保存 header 并使用 ConvertFrom-Csv
为您解析它。还没有在大文件上测试过它,但它不必将整个文件加载到内存中,所以它至少应该可以工作。例如:
#Create samplefile
@"
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
5678,"Some text, More Text, Text again and again",Text2
$(1..100000 | % { "$_,`"Some text$_, More Text$_, Text again and again$_`",Text$_`n" })
"@ -split "`n" | % { $_.trim() } | Set-Content D:\Downloads\test.txt
$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object IO.StreamReader($src)
#Get header and split to columns
$columns = $reader.ReadLine() -split ','
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while(($line = $reader.ReadLine()) -ne $null){
#Let ConvertFrom-CSV do the heavy-lifting by making it convert one "csv-file" per line using a known header
$obj = $line | ConvertFrom-Csv -Header $columns
$row = $datatable.NewRow()
$row.ID = $obj.ID
$row.Desc = $obj.Desc
$row.Obs = $obj.Obs
$datatable.Rows.Add($row)
}
测试:
#Show available columns
$datatable.Columns.Caption
ID
Desc
Obs
#Show datatable
$datatable
ID Desc Obs
-- ---- ---
1234 Some text, More Text, Text again Text
5678 Some text, More Text, Text again and again Text2
解决方案 2 - TextFieldParser:
VisualBasic-assembly 有一个理解引用字段的 TextFieldParser-class。这将执行得更快(在我的 100k csv-test 中快 50%),因为直接使用 .NET 时开销更少。尝试:
Add-Type -AssemblyName Microsoft.VisualBasic
$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object -TypeName Microsoft.VisualBasic.FileIO.TextFieldParser -ArgumentList $src
$reader.Delimiters = @(",")
#Default values, but wanted to show the options
$reader.HasFieldsEnclosedInQuotes = $true
$reader.TrimWhiteSpace = $true
#Get header as array
$columns = $reader.ReadFields()
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while($fields = $reader.ReadFields()) {
$row = $datatable.NewRow()
#Insert value in property using field(column) index
for ($i = 0; $i -lt $columns.Count; $i++) {
$row.($columns[$i]) = $fields[$i]
}
$datatable.Rows.Add($row)
}
$reader.Close()
试试这个:
$csv=import-csv "C:\temp\vminfo.csv"
$datatable = New-Object System.Data.DataTable
#Add all columns
$columnsname=$csv | Get-Member -MemberType NoteProperty | %{ $datatable.columns.add($_.Name) }
#Add datas by column name
$csv | %{
$newrow=$datatable.NewRow()
$rowcsv=$_
$columnsname | %{$newrow[$_]=$rowcsv."$_"}
$datatable.Rows.Add($newrow)
}
我在 PowerShell 中有一个脚本,用于从读取 .csv 或 .txt 文件(均以逗号分隔)并将数据表插入数据库的 StreamReader 构建数据表。
我的源文件包含双引号内带逗号的数据,示例:
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
问题是我要拆分行的时间:
$datatable = New-Object System.Data.DataTable
$src = "My comma delimited file (.txt/.csv)"
$reader = New-Object IO.StreamReader($src)
$header = Get-Content -Path $src | select -First 1
$columns = $header.Split(",")
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while(($line = $reader.ReadLine()) -ne $null){
$line = $line -split(",")
由于双引号中的逗号,拆分给我 5 列而不是 3 列。
我不想删除双引号内的逗号。数据是这样插入的:Some text, More Text, Text again
如何解决这个问题?
解决方案 1 - ConvertFrom-CSV:
保存 header 并使用 ConvertFrom-Csv
为您解析它。还没有在大文件上测试过它,但它不必将整个文件加载到内存中,所以它至少应该可以工作。例如:
#Create samplefile
@"
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
5678,"Some text, More Text, Text again and again",Text2
$(1..100000 | % { "$_,`"Some text$_, More Text$_, Text again and again$_`",Text$_`n" })
"@ -split "`n" | % { $_.trim() } | Set-Content D:\Downloads\test.txt
$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object IO.StreamReader($src)
#Get header and split to columns
$columns = $reader.ReadLine() -split ','
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while(($line = $reader.ReadLine()) -ne $null){
#Let ConvertFrom-CSV do the heavy-lifting by making it convert one "csv-file" per line using a known header
$obj = $line | ConvertFrom-Csv -Header $columns
$row = $datatable.NewRow()
$row.ID = $obj.ID
$row.Desc = $obj.Desc
$row.Obs = $obj.Obs
$datatable.Rows.Add($row)
}
测试:
#Show available columns
$datatable.Columns.Caption
ID
Desc
Obs
#Show datatable
$datatable
ID Desc Obs
-- ---- ---
1234 Some text, More Text, Text again Text
5678 Some text, More Text, Text again and again Text2
解决方案 2 - TextFieldParser: VisualBasic-assembly 有一个理解引用字段的 TextFieldParser-class。这将执行得更快(在我的 100k csv-test 中快 50%),因为直接使用 .NET 时开销更少。尝试:
Add-Type -AssemblyName Microsoft.VisualBasic
$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object -TypeName Microsoft.VisualBasic.FileIO.TextFieldParser -ArgumentList $src
$reader.Delimiters = @(",")
#Default values, but wanted to show the options
$reader.HasFieldsEnclosedInQuotes = $true
$reader.TrimWhiteSpace = $true
#Get header as array
$columns = $reader.ReadFields()
foreach ($column in $columns) {
$datatable.columns.add($column)
}
while($fields = $reader.ReadFields()) {
$row = $datatable.NewRow()
#Insert value in property using field(column) index
for ($i = 0; $i -lt $columns.Count; $i++) {
$row.($columns[$i]) = $fields[$i]
}
$datatable.Rows.Add($row)
}
$reader.Close()
试试这个:
$csv=import-csv "C:\temp\vminfo.csv"
$datatable = New-Object System.Data.DataTable
#Add all columns
$columnsname=$csv | Get-Member -MemberType NoteProperty | %{ $datatable.columns.add($_.Name) }
#Add datas by column name
$csv | %{
$newrow=$datatable.NewRow()
$rowcsv=$_
$columnsname | %{$newrow[$_]=$rowcsv."$_"}
$datatable.Rows.Add($newrow)
}