使用加载数据将 CSV 转换为 MySQL

CSV into MySQL using load data

我希望使用 MySQL 加载文件将 CSV 的内容插入到我的 table 中,但是,每次我使用以下命令这样做时,都会删除一些行

LOAD DATA INFILE 'new.csv' INTO TABLE Example
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

我希望插入的 CSV 数据是 Companies House (http://download.companieshouse.gov.uk/en_output.html) 提供的免费公司产品数据

如果有任何帮助,我将不胜感激。 谢谢!

一方面——LOAD DATA INFILE,快速方法,但现实生活中总是需要一些检查、转换和其他业务逻辑。

和常规做法 - 使用 ETL 工具而不是直接导入

或多阶段流程 - 清理数据(检查、记录错误、转换、在第一步添加计算列等),然后导入最终结果。

现在有许多优秀的开源工具可用于此:

改造你的import/export逻辑 你可以实现所有步骤,例如:

鉴于此 table 定义,根据提供的数据规范 here:

create table companieshouse
(
CompanyName varchar(160),
CompanyNumber varchar(8),
RegAddressCareOf varchar(100),
RegAddressPOBox varchar(10),
RegAddressAddressLine1 varchar(300),
RegAddressAddressLine2 varchar(300),
RegAddressPostTown varchar(50),
RegAddressCounty varchar(60),
RegAddressCountry varchar(50),
RegAddressPostCode varchar(20),
CompanyCategory varchar(100),
CompanyStatus varchar(70),
CountryOfOrigin varchar(50),
DissolutionDate date,
IncorporationDate date,
AccountsAccountRefDay integer,
AccountsAccountRefMonth integer,
AccountsNextDueDate date,
AccountsLastMadeUpDate date,
AccountsAccountCategory varchar(30),
ReturnsNextDueDate date,
ReturnsLastMadeUpDate date,
MortgagesNumMortCharges integer,
MortgagesNumMortOutstanding integer,
MortgagesNumMortPartSatisfied integer,
MortgagesNumMortSatisfied integer,
SICCodeSicText_1 varchar(170),
SICCodeSicText_2 varchar(170),
SICCodeSicText_3 varchar(170),
SICCodeSicText_4 varchar(170),
LimitedPartnershipsNumGenPartners integer,
LimitedPartnershipsNumLimPartners integer,
URI varchar(47),
PreviousName1CONDATE date,
PreviousName1CompanyName varchar(160),
PreviousName2CONDATE date,
PreviousName2CompanyName varchar(160),
PreviousName3CONDATE date,
PreviousName3CompanyName varchar(160),
PreviousName4CONDATE date,
PreviousName4CompanyName varchar(160),
PreviousName5CONDATE date,
PreviousName5CompanyName varchar(160),
PreviousName6CONDATE date, 
PreviousName6CompanyName varchar(160),
PreviousName7CONDATE date,
PreviousName7CompanyName varchar(160),
PreviousName8CONDATE date,
PreviousName8CompanyName varchar(160),
PreviousName9CONDATE date, 
PreviousName9CompanyName varchar(160),
PreviousName10CONDATE date, 
PreviousName10CompanyName varchar(160),
ConfStmtNextDueDate date, 
ConfStmtLastMadeUpDate date
);

这会将提供的 .csv 文件中的数据加载到 table;

LOAD DATA INFILE '/var/lib/mysql-files/BasicCompanyData-2017-03-06-part1_5.csv' 
INTO TABLE companieshouse 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
ESCAPED BY ""
LINES TERMINATED BY '\n'
 IGNORE 1 LINES
(
CompanyName,
CompanyNumber,
RegAddressCareOf,
RegAddressPOBox,
RegAddressAddressLine1,
RegAddressAddressLine2,
RegAddressPostTown,
RegAddressCounty,
RegAddressCountry,
RegAddressPostCode,
CompanyCategory,
CompanyStatus,
CountryOfOrigin,
@DissolutionDate,
@IncorporationDate,
@AccountsAccountRefDay,
@AccountsAccountRefMonth,
@AccountsNextDueDate,
@AccountsLastMadeUpDate,
AccountsAccountCategory,
@ReturnsNextDueDate,
@ReturnsLastMadeUpDate,
@MortgagesNumMortCharges,
@MortgagesNumMortOutstanding,
@MortgagesNumMortPartSatisfied,
@MortgagesNumMortSatisfied,
SICCodeSicText_1,
SICCodeSicText_2,
SICCodeSicText_3,
SICCodeSicText_4,
@LimitedPartnershipsNumGenPartners,
@LimitedPartnershipsNumLimPartners,
URI,
@PreviousName1CONDATE,
PreviousName1CompanyName,
@PreviousName2CONDATE,
PreviousName2CompanyName,
@PreviousName3CONDATE,
PreviousName3CompanyName,
@PreviousName4CONDATE,
PreviousName4CompanyName,
@PreviousName5CONDATE,
PreviousName5CompanyName,
@PreviousName6CONDATE, 
PreviousName6CompanyName,
@PreviousName7CONDATE,
PreviousName7CompanyName,
@PreviousName8CONDATE,
PreviousName8CompanyName,
@PreviousName9CONDATE, 
PreviousName9CompanyName,
@PreviousName10CONDATE, 
PreviousName10CompanyName,
@ConfStmtNextDueDate, 
@ConfStmtLastMadeUpDate)
SET DissolutionDate = IF(@DissolutionDate = '', NULL, STR_TO_DATE(@DissolutionDate, '%d/%m/%Y')),
IncorporationDate = IF(@IncorporationDate = '', NULL, STR_TO_DATE(@IncorporationDate, '%d/%m/%Y')),
AccountsNextDueDate = IF(@AccountsNextDueDate = '', NULL, STR_TO_DATE(@AccountsNextDueDate, '%d/%m/%Y')),
AccountsLastMadeUpDate = IF(@AccountsLastMadeUpDate = '', NULL, STR_TO_DATE(@AccountsLastMadeUpDate, '%d/%m/%Y')),
ReturnsNextDueDate = IF(@ReturnsNextDueDate = '', NULL,  STR_TO_DATE(@ReturnsNextDueDate, '%d/%m/%Y')),
ReturnsLastMadeUpDate = IF(@ReturnsLastMadeUpDate = '', NULL, STR_TO_DATE(@ReturnsLastMadeUpDate, '%d/%m/%Y')),
PreviousName1CONDATE = IF(@PreviousName1CONDATE = '', NULL, STR_TO_DATE(@PreviousName1CONDATE, '%d/%m/%Y')),
PreviousName2CONDATE = IF(@PreviousName2CONDATE = '', NULL, STR_TO_DATE(@PreviousName2CONDATE, '%d/%m/%Y')),
PreviousName3CONDATE = IF(@PreviousName3CONDATE = '', NULL, STR_TO_DATE(@PreviousName3CONDATE, '%d/%m/%Y')),
PreviousName4CONDATE = IF(@PreviousName4CONDATE = '', NULL, STR_TO_DATE(@PreviousName4CONDATE, '%d/%m/%Y')),
PreviousName5CONDATE = IF(@PreviousName5CONDATE = '', NULL, STR_TO_DATE(@PreviousName5CONDATE, '%d/%m/%Y')),
PreviousName6CONDATE = IF(@PreviousName6CONDATE = '', NULL, STR_TO_DATE(@PreviousName6CONDATE, '%d/%m/%Y')),
PreviousName7CONDATE = IF(@PreviousName7CONDATE = '', NULL, STR_TO_DATE(@PreviousName7CONDATE, '%d/%m/%Y')),
PreviousName8CONDATE = IF(@PreviousName8CONDATE = '', NULL, STR_TO_DATE(@PreviousName8CONDATE, '%d/%m/%Y')),
PreviousName9CONDATE = IF(@PreviousName9CONDATE = '', NULL, STR_TO_DATE(@PreviousName9CONDATE, '%d/%m/%Y')),
PreviousName10CONDATE = IF(@PreviousName10CONDATE = '', NULL, STR_TO_DATE(@PreviousName10CONDATE, '%d/%m/%Y')),
AccountsAccountRefDay = NULLIF(@AccountsAccountRefDay, ''),
AccountsAccountRefMonth = NULLIF(@AccountsAccountRefMonth, '') ,
MortgagesNumMortCharges = NULLIF(@MortgagesNumMortCharges, ''),
MortgagesNumMortOutstanding = NULLIF(@MortgagesNumMortOutstanding, ''),
MortgagesNumMortPartSatisfied = NULLIF(@MortgagesNumMortPartSatisfied, ''),
MortgagesNumMortSatisfied = NULLIF(@MortgagesNumMortSatisfied, ''),
LimitedPartnershipsNumGenPartners = NULLIF(@LimitedPartnershipsNumGenPartners, ''),
LimitedPartnershipsNumLimPartners = NULLIF(@LimitedPartnershipsNumLimPartners, '')
;

...加载数据时没有任何错误或警告:

Query OK, 849999 rows affected (19.43 sec)
Records: 849999  Deleted: 0  Skipped: 0  Warnings: 0