批量插入以分号作为分隔符的 csv 文件

Bulk insert csv file with semicolon as delimiter

我正在尝试将数据从分号分隔的 csv 文件导入 SQL 服务器数据库。这是 table 结构

CREATE TABLE [dbo].[waste_facility] 
(
    [Id]           INT             IDENTITY (1, 1) NOT NULL,
    [postcode]     VARCHAR (50)    NULL,
    [name]         VARCHAR (50)    NULL,
    [type]         VARCHAR (255)   NULL,
    [street]       VARCHAR (255)   NULL,
    [suburb]       VARCHAR (255)   NULL,
    [municipality] VARCHAR (255)   NULL,
    [telephone]    VARCHAR (255)   NULL,
    [website]      VARCHAR (255)   NULL,
    [longtitude]   DECIMAL (18, 8) NULL,
    [latitude]     DECIMAL (18, 8) NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC)
);

csv文件如下所示:

Location Coordinate;Feature Extent;Projection;Postcode;Name Of Facility;Type Of Facility;Street;Suburb;Municipality;Telephone Number;Website;Easting Coordinate;Northing Coordinate;Longitude Coordinate;Latitude Coordinate;Google Maps Direction
-37.9421182892,145.3193857967;"{""coordinates"": [145.3193857967, -37.9421182892], ""type"": ""Point""}";MGA zone 55;3156;Cleanaway Lysterfield Resource Recovery Centre;Recovery Centre;840 Wellington Road;LYSTERFIELD;Yarra Ranges;9753 5411;https://www.cleanaway.com.au/location/lysterfield/;352325;5799275;145.31938579674124;-37.94211828921733;https://www.google.com.au/maps/dir//-37.94211828921733,145.31938579674124/@your+location,17z/data=!4m2!4m1!3e0
-38.0529529215,145.2433557709;"{""coordinates"": [145.2433557709, -38.0529529215], ""type"": ""Point""}";MGA zone 55;3175;Smart Recycling (South Eastern Depot);Recycling Centre;185 Dandenong-Hastings Rd;LYNDHURST;Greater Dandenong;8787 3300;https://smartrecycling.com.au/;345876;5786853;145.24335577090602;-38.05295292152536;https://www.google.com.au/maps/dir//-38.05295292152536,145.24335577090602/@your+location,17z/data=!4m2!4m1!3e0
-38.0533129717,145.267610135;"{""coordinates"": [145.267610135, -38.0533129717], ""type"": ""Point""}";MGA zone 55;3976;Hampton Park Transfer Station (Outlook Environmental);Transfer Station;274 Hallam Road;HAMPTON PARK;Casey;9554 4502;https://www.suez.com.au/en-au/who-we-are/suez-in-australia-and-new-zealand/our-locations/waste-management-hampton-park-transfer-station;348005;5786853;145.2676101350274;-38.053312971691255;https://www.google.com.au/maps/dir//-38.053312971691255,145.2676101350274/@your+location,17z/data=!4m2!4m1!3e0
-38.1243050577,145.2183465487;"{""coordinates"": [145.2183465487, -38.1243050577], ""type"": ""Point""}";MGA zone 55;3977;Frankston Regional Recycling and Recovery Centre;Recycling Centre;20 Harold Road;SKYE;Frankston;1300 322 322;https://www.frankston.vic.gov.au/Environment-and-Waste/Waste-and-Recycling/Frankston-Regional-Recycling-and-Recovery-Centre-FRRRC/Accepted-Items-at-FRRRC;343833;5778893;145.21834654873447;-38.12430505770815;https://www.google.com.au/maps/dir//-38.12430505770815,145.21834654873447/@your+location,17z/data=!4m2!4m1!3e0
-38.0973208774,145.4920399066;"{""coordinates"": [145.4920399066, -38.0973208774], ""type"": ""Point""}";MGA zone 55;3810;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your+location,17z/data=!4m2!4m1!3e0

有些列我不需要,所以我创建了一个格式文件来导入数据。格式文件如下所示

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="12" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="13" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="2" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="3" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="4" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="5" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="6" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="7" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="8" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="9" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="14" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="15" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="10" xsi:type="CharFixed" LENGTH="41"/>
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
  <FIELD ID="16" xsi:type="CharFixed" LENGTH="50"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="postcode" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="3" NAME="name" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="4" NAME="type" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="5" NAME="street" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="6" NAME="suburb" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="7" NAME="municipality" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="8" NAME="telephone" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="9" NAME="website" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="10" NAME="longtitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
  <COLUMN SOURCE="11" NAME="latitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
 </ROW>
</BCPFORMAT>

然后我尝试了批量插入和 bcp in - 它们都不起作用。

这里是批量插入命令

USE [waste-facility-locations];  

BULK INSERT [dbo].[waste_facility]   
FROM 'E:\onboardingIteration\waste-facility-locations.csv'   
WITH (FORMATFILE = 'E:\onboardingIteration\waste_facility_formatter.xml',
      FIRSTROW = 2,
      LASTROW = 6,
      FIELDTERMINATOR = ';',
      ROWTERMINATOR = '\n',
      ERRORFILE = 'E:\onboardingIteration\myRubbishData.log');  

但不幸的是生成了一些错误文件。这是 myRubbishData.log 错误的内容:

Row 2 File Offset 1993 ErrorFile Offset 0 - HRESULT 0x80004005

并且存储在myRubbishData.txt中的实际行:

;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your+location,17z/data=!4m2!4m1!3e0;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09

如您所见,行似乎没有正确分隔。因此,我尝试将行分隔符更改为“\n”,“\r”,“\n\r”,“\r\n”,其中 none 有效。

我尝试了 bcp。也没用。

这是我使用的 bcp 命令:

bcp [waste-facility-locations].[dbo].[waste_facility] in "E:\onboardingIteration\waste-facility-locations.csv" -f "E:\onboardingIteration\waste_facility_formatter.xml" -T -S "(LocalDB)\MSSQLLocalDB" -F 2 -t ";" -r "\n"

然后我收到一个错误,说的是同样的事情

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file

0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1

一件有趣的事情是,如果我创建一个新的 excel 并选择“获取数据”选项来导入 csv 文件,则可以正确地解析该文件。

基本上我在这里找不到我做错了什么。有人可以帮我解决这个问题吗?

SQL 服务器导入工具不能容忍不良数据,甚至不能容忍格式变化或选项。在我的职业生涯中,我确实花费了数千 work-hours 来尝试为客户开发和调试导入程序。我现在可以告诉你,尝试单独使用 SQL 解决这个问题既困难又 time-consuming.

当您遇到此问题(错误数据 and/or 格式不一致)时,几乎总是更容易找到或开发更灵活的工具来 pre-process 将数据转换为 [=33] 的严格标准=] 预计。所以我想说,如果 Excel 可以解析它,那么只需使用 Excel 自动化来 pre-process 它们,然后使用 SQL 导入 Excel 输出。如果这对您不切实际,那么我建议您使用某种客户端语言(C#、Vb、Java、Python 等)编写您自己的工具到 pre-process文件。

你可以在 SQL 中完成(我已经完成了很多次),但我向你保证,这是一个漫长而复杂的旅程。

SSIS 对于此类问题具有更灵活的 error-handling,但如果您还不熟悉和使用它,它的学习曲线 非常 陡峭,您的第一个SSIS项目很可能也很time-consuming。