SQL 服务器批量插入不将双引号识别为 fieldquote
SQL Server bulk insert doesn't recognize double quotes as fieldquote
我正在尝试在 SQL Server 2017 14.0.1000.169 中批量插入文件。我想在文件到达时完全获取文件,将其保存到所需位置,然后 运行 批量插入查询 根本不需要修改文件 。我很难让脚本识别并忽略文本文件中的双引号,除非我手动将行尾从 Unix 更改为 Windows。我在这里和 SO 之外阅读了很多主题,讨论与这个主题接近的主题,唉,none 其中给了我问题的答案:
如何批量插入带有 Unix 行结尾的文件而不是双引号结尾?
我的 文件 看起来像这样:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
"Dimension 1","Dimension 2","Dimension 3","Dimension 4","Dimension 5","Dimension 6","Dimension 7","Dimension 8","Dimension 9","Dimension 10","Dimension 11","Dimension 12","Dimension 13","Dimension 14","Dimension 15","Dimension 16","Dimension 17","Metric 1","Metric 2","Metric 3","Metric 4","Metric 5","Metric 6","Metric 7","Metric 8","Metric 9","Metric 10","Metric 11","Metric 12"
"string","string","date as string","string","string","string","string","string","string","string","string","string","string","string","string","string","string","bigint","bigint","decimal","decimal","decimal","bigint","decimal","decimal","bigint","decimal","bigint","bigint"
我使用的查询如下:
DROP TABLE IF EXISTS Table
GO
CREATE TABLE [dbo].[Table](
[Dimension 1] [varchar] (255) NULL,
[Dimension 2] [varchar] (255) NULL,
[Dimension 3] [varchar] (255) NULL,
[Dimension 4] [varchar] (255) NULL,
[Dimension 5] [varchar] (255),
[Dimension 6] [varchar] (255) NULL,
[Dimension 7] [varchar] (255) NULL,
[Dimension 8] [varchar] (255) NULL,
[Dimension 9] [varchar] (1000) NULL,
[Dimension 10] [varchar] (255) NULL,
[Dimension 11] [varchar] (255) NULL,
[Dimension 12] [varchar] (255) NULL,
[Dimension 13] [varchar] (1000) NULL,
[Dimension 14] [varchar] (1000) NULL,
[Dimension 15] [varchar] (1000) NULL,
[Dimension 16] [varchar] (1000) NULL,
[Dimension 17] [varchar] (1000) NULL,
[Metric 1] [varchar] (50) NULL,
[Metric 2] [varchar] (50) NULL,
[Metric 3] [varchar] (50) NULL,
[Metric 4] [varchar] (50) NULL,
[Metric 5] [varchar] (50) NULL,
[Metric 6] [varchar] (50) NULL,
[Metric 7] [varchar] (50) NULL,
[Metric 8] [varchar] (50) NULL,
[Metric 9] [varchar] (50) NULL,
[Metric 10] [varchar] (255) NULL,
[Metric 11] [varchar] (50) NULL,
[Metric 12] [varchar] (50) NULL
) ON [PRIMARY]
GO
BULK
INSERT Table
FROM 'C:\Users\username\Folder\File.csv'
WITH
(
--FORMAT = 'CSV',
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\n',
ROWTERMINATOR = '0x0a',
FIRSTROW = 7,
--FIELDQUOTE = '"'
FIELDQUOTE = '0x22'
)
;
正如您在上面看到的,我将所有内容都导入为 varchar。最初我只将它用于一个指标(由于供应端的数据质量问题),因为我完全打算在文件加载后纠正每一个缺陷。 运行 遇到困难,但我已将所有指标设置为 varchar,因此至少文件会加载,我可以看到它的样子并进一步挖掘。
到目前为止,我已尝试以下方法:
- 在 Sublime 中打开文件,删除前 7 行,将行结尾更改为 Windows 并保存 - 这适用于我注释掉的行,即 FORMAT 而不是 DATAFILETYPE,\n 而不是 0x0a 和FIELDQUOTE 为“
保持文件不变,运行 上面的脚本用双引号代替 0x22 - 这也有效,但最终结果是每个值都在双引号中
保持文件不变,运行 上面的脚本原样(即使用 0x22 作为 FIELDQUOTE)- 再次有效,但到处都是双引号
到目前为止我尝试过的所有其他事情都导致了各种错误,这些错误都导致了同样的两件事:要么我不能使用 FORMAT = 'CSV' (如果我将 Unix 行结尾留在),或者当我尝试将指标加载为浮点数时,它会因为双引号而出错。
我暂时有一个解决方法(我可以删除双引号并在加载后转换字段),但我想知道我是否可以将该步骤集成到批量插入中(就像我所做的那样)当我加载带有 Windows 结尾的文件时)。
N.B。我知道 FIELDQUOTE 已经存在太久了,但是,根据 Microsoft,它应该适用于我的构建:
"FIELDQUOTE = 'field_quote' Applies to: SQL Server 2017 (14.x) CTP
1.1. Specifies a character that will be used as the quote character in the CSV file. If not specified, the quote character (") will be used
as the quote character as defined in the RFC 4180 standard."
我是不是忘了透露什么?如果没有,有什么我可能忽略的想法吗?
提前致谢!
好的。这里最大的问题是你的文件。首先,由于顶部的行,文件没有 RFC 4180。这让人头疼。
接下来是关于FIRSTROW
的重要警告:
When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
注意这里说的是 字段终止符 不是 行终止符。这是第二个问题。对于您的数据,开头是这样的:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
<-- Blank Line -->
这是 6 个字段终止符和 6 个行终止符。
接下来,CSV 文件中的列比 table Table
多 列。 Table
没有列 Dimension 17
。
添加这个缺失的列后,我设法让这个工作得到我相信你想要的结果:
BULK INSERT [Table]
FROM '/tmp/YourFile2.txt'
WITH (FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2,
FORMAT = 'CSV',
FIELDQUOTE = '"');
这在 table 中插入了 1 行。
我正在尝试在 SQL Server 2017 14.0.1000.169 中批量插入文件。我想在文件到达时完全获取文件,将其保存到所需位置,然后 运行 批量插入查询 根本不需要修改文件 。我很难让脚本识别并忽略文本文件中的双引号,除非我手动将行尾从 Unix 更改为 Windows。我在这里和 SO 之外阅读了很多主题,讨论与这个主题接近的主题,唉,none 其中给了我问题的答案:
如何批量插入带有 Unix 行结尾的文件而不是双引号结尾?
我的 文件 看起来像这样:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
"Dimension 1","Dimension 2","Dimension 3","Dimension 4","Dimension 5","Dimension 6","Dimension 7","Dimension 8","Dimension 9","Dimension 10","Dimension 11","Dimension 12","Dimension 13","Dimension 14","Dimension 15","Dimension 16","Dimension 17","Metric 1","Metric 2","Metric 3","Metric 4","Metric 5","Metric 6","Metric 7","Metric 8","Metric 9","Metric 10","Metric 11","Metric 12"
"string","string","date as string","string","string","string","string","string","string","string","string","string","string","string","string","string","string","bigint","bigint","decimal","decimal","decimal","bigint","decimal","decimal","bigint","decimal","bigint","bigint"
我使用的查询如下:
DROP TABLE IF EXISTS Table
GO
CREATE TABLE [dbo].[Table](
[Dimension 1] [varchar] (255) NULL,
[Dimension 2] [varchar] (255) NULL,
[Dimension 3] [varchar] (255) NULL,
[Dimension 4] [varchar] (255) NULL,
[Dimension 5] [varchar] (255),
[Dimension 6] [varchar] (255) NULL,
[Dimension 7] [varchar] (255) NULL,
[Dimension 8] [varchar] (255) NULL,
[Dimension 9] [varchar] (1000) NULL,
[Dimension 10] [varchar] (255) NULL,
[Dimension 11] [varchar] (255) NULL,
[Dimension 12] [varchar] (255) NULL,
[Dimension 13] [varchar] (1000) NULL,
[Dimension 14] [varchar] (1000) NULL,
[Dimension 15] [varchar] (1000) NULL,
[Dimension 16] [varchar] (1000) NULL,
[Dimension 17] [varchar] (1000) NULL,
[Metric 1] [varchar] (50) NULL,
[Metric 2] [varchar] (50) NULL,
[Metric 3] [varchar] (50) NULL,
[Metric 4] [varchar] (50) NULL,
[Metric 5] [varchar] (50) NULL,
[Metric 6] [varchar] (50) NULL,
[Metric 7] [varchar] (50) NULL,
[Metric 8] [varchar] (50) NULL,
[Metric 9] [varchar] (50) NULL,
[Metric 10] [varchar] (255) NULL,
[Metric 11] [varchar] (50) NULL,
[Metric 12] [varchar] (50) NULL
) ON [PRIMARY]
GO
BULK
INSERT Table
FROM 'C:\Users\username\Folder\File.csv'
WITH
(
--FORMAT = 'CSV',
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\n',
ROWTERMINATOR = '0x0a',
FIRSTROW = 7,
--FIELDQUOTE = '"'
FIELDQUOTE = '0x22'
)
;
正如您在上面看到的,我将所有内容都导入为 varchar。最初我只将它用于一个指标(由于供应端的数据质量问题),因为我完全打算在文件加载后纠正每一个缺陷。 运行 遇到困难,但我已将所有指标设置为 varchar,因此至少文件会加载,我可以看到它的样子并进一步挖掘。
到目前为止,我已尝试以下方法:
- 在 Sublime 中打开文件,删除前 7 行,将行结尾更改为 Windows 并保存 - 这适用于我注释掉的行,即 FORMAT 而不是 DATAFILETYPE,\n 而不是 0x0a 和FIELDQUOTE 为“
保持文件不变,运行 上面的脚本用双引号代替 0x22 - 这也有效,但最终结果是每个值都在双引号中
保持文件不变,运行 上面的脚本原样(即使用 0x22 作为 FIELDQUOTE)- 再次有效,但到处都是双引号
到目前为止我尝试过的所有其他事情都导致了各种错误,这些错误都导致了同样的两件事:要么我不能使用 FORMAT = 'CSV' (如果我将 Unix 行结尾留在),或者当我尝试将指标加载为浮点数时,它会因为双引号而出错。
我暂时有一个解决方法(我可以删除双引号并在加载后转换字段),但我想知道我是否可以将该步骤集成到批量插入中(就像我所做的那样)当我加载带有 Windows 结尾的文件时)。
N.B。我知道 FIELDQUOTE 已经存在太久了,但是,根据 Microsoft,它应该适用于我的构建:
"FIELDQUOTE = 'field_quote' Applies to: SQL Server 2017 (14.x) CTP 1.1. Specifies a character that will be used as the quote character in the CSV file. If not specified, the quote character (") will be used as the quote character as defined in the RFC 4180 standard."
我是不是忘了透露什么?如果没有,有什么我可能忽略的想法吗?
提前致谢!
好的。这里最大的问题是你的文件。首先,由于顶部的行,文件没有 RFC 4180。这让人头疼。
接下来是关于FIRSTROW
的重要警告:
When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
注意这里说的是 字段终止符 不是 行终止符。这是第二个问题。对于您的数据,开头是这样的:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
<-- Blank Line -->
这是 6 个字段终止符和 6 个行终止符。
接下来,CSV 文件中的列比 table Table
多 列。 Table
没有列 Dimension 17
。
添加这个缺失的列后,我设法让这个工作得到我相信你想要的结果:
BULK INSERT [Table]
FROM '/tmp/YourFile2.txt'
WITH (FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2,
FORMAT = 'CSV',
FIELDQUOTE = '"');
这在 table 中插入了 1 行。