使用批量复制将 CSV 导入 SQL 服务器
Importing CSV to SQL Server using bulkcopy
我正在尝试将我的 CSV 文件导入 SQL 服务器。我找到了这段代码,它运行得非常好而且非常快:
# Database variables
$sqlserver = "servername"
$database = "datebasename"
$table = "tablename"
# CSV variables
$csvfile = "F:\TestNA\fin_product4.csv"
$csvdelimiter = ";"
$firstRowColumnNames = $true
################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }
#foreach ($column in $columns) {
# $null = $datatable.Columns.Add()
#}
$col1 = New-Object system.Data.DataColumn fin_product_rk,([datetime])
$col2 = New-Object system.Data.DataColumn fin_product_id,([datetime])
$datatable.columns.add($col1)
$datatable.columns.add($col2)
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++; if (($i % $batchsize) -eq 0) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()
问题是:它适用于标准拉丁编码,但我有 UTF-8 和 Windows-1251 编码的 CSV。
我应该添加什么以及在哪里添加以更改此代码中的编码?
我不知道用于编写此代码的编程语言,所以我无法自己完成,如果有人可以提供帮助,我将很高兴!
谢谢!
更新:CSV 示例:
product;product_id;product_nm;dttm
220;text;некоторый текст;12JAN2021:18:03:41.000000
220;text;некоторый текст;1JAN2021:18:03:41.000000
564;text;некоторый текст;16JAN2021:18:03:41.000000
这是 T-SQL 中的解决方案。
与powershell相比,非常简洁,一条语句。
要点:
BULK INSERT
参数CODEPAGE = '65001'
指定UTF-8.
product_nm NVARCHAR(100)
列包含文件中的 UNICODE 字符。
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE dbo.tbl (
product VARCHAR(10),
product_id VARCHAR(30),
product_nm NVARCHAR(100),
dttm VARCHAR(50)
);
BULK INSERT dbo.tbl
FROM 'e:\Temp\Faenno_2.csv'
WITH (FORMAT='CSV'
, DATAFILETYPE = 'char' -- { 'char' | 'native' | 'widechar' | 'widenative' }
, FIELDTERMINATOR = ';'
, ROWTERMINATOR = '\n'
, FIRSTROW = 2
, CODEPAGE = '65001');
-- test
SELECT * FROM dbo.tbl;
输出
+---------+------------+-----------------+---------------------------+
| product | product_id | product_nm | dttm |
+---------+------------+-----------------+---------------------------+
| 220 | text | некоторый текст | 12JAN2021:18:03:41.000000 |
| 220 | text | некоторый текст | 1JAN2021:18:03:41.000000 |
| 564 | text | некоторый текст | 16JAN2021:18:03:41.000000 |
+---------+------------+-----------------+---------------------------+
我正在尝试将我的 CSV 文件导入 SQL 服务器。我找到了这段代码,它运行得非常好而且非常快:
# Database variables
$sqlserver = "servername"
$database = "datebasename"
$table = "tablename"
# CSV variables
$csvfile = "F:\TestNA\fin_product4.csv"
$csvdelimiter = ";"
$firstRowColumnNames = $true
################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000
# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable
# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }
#foreach ($column in $columns) {
# $null = $datatable.Columns.Add()
#}
$col1 = New-Object system.Data.DataColumn fin_product_rk,([datetime])
$col2 = New-Object system.Data.DataColumn fin_product_id,([datetime])
$datatable.columns.add($col1)
$datatable.columns.add($col2)
# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null) {
$null = $datatable.Rows.Add($line.Split($csvdelimiter))
$i++; if (($i % $batchsize) -eq 0) {
$bulkcopy.WriteToServer($datatable)
Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
$datatable.Clear()
}
}
# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()
Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()
问题是:它适用于标准拉丁编码,但我有 UTF-8 和 Windows-1251 编码的 CSV。
我应该添加什么以及在哪里添加以更改此代码中的编码?
我不知道用于编写此代码的编程语言,所以我无法自己完成,如果有人可以提供帮助,我将很高兴!
谢谢!
更新:CSV 示例:
product;product_id;product_nm;dttm
220;text;некоторый текст;12JAN2021:18:03:41.000000
220;text;некоторый текст;1JAN2021:18:03:41.000000
564;text;некоторый текст;16JAN2021:18:03:41.000000
这是 T-SQL 中的解决方案。
与powershell相比,非常简洁,一条语句。
要点:
BULK INSERT
参数CODEPAGE = '65001'
指定UTF-8.product_nm NVARCHAR(100)
列包含文件中的 UNICODE 字符。
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE dbo.tbl (
product VARCHAR(10),
product_id VARCHAR(30),
product_nm NVARCHAR(100),
dttm VARCHAR(50)
);
BULK INSERT dbo.tbl
FROM 'e:\Temp\Faenno_2.csv'
WITH (FORMAT='CSV'
, DATAFILETYPE = 'char' -- { 'char' | 'native' | 'widechar' | 'widenative' }
, FIELDTERMINATOR = ';'
, ROWTERMINATOR = '\n'
, FIRSTROW = 2
, CODEPAGE = '65001');
-- test
SELECT * FROM dbo.tbl;
输出
+---------+------------+-----------------+---------------------------+
| product | product_id | product_nm | dttm |
+---------+------------+-----------------+---------------------------+
| 220 | text | некоторый текст | 12JAN2021:18:03:41.000000 |
| 220 | text | некоторый текст | 1JAN2021:18:03:41.000000 |
| 564 | text | некоторый текст | 16JAN2021:18:03:41.000000 |
+---------+------------+-----------------+---------------------------+