使用 bash 将多个 .csv 文件导入到 mysql

Import multiple .csv files to mysql using bash

我想使用 bash 脚本首先将文件从 google 驱动器复制到本地驱动器并重命名(删除空格并添加前缀),然后导入这些.csv 文件到 mysql 数据库中。这些 .csv 有不同的列,所以我需要从每个 .csv 中获取列名并使用这些列名创建一个 table,然后导入数据。

如有任何帮助,我将不胜感激。

我在此处 https://ericlondon.com/2011/04/10/a-bash-shell-script-to-import-a-large-number-of-csv-files-into-mysql.html 找到了一些代码,我已经对其进行了修改,但无法使其正常工作。我承认我不明白它的作用。

这是我目前的脚本:

#!/bin/bash

# show commands being executed, per debug
set -x

# define database connectivity
_db="dbname"
_db_user="user"
_db_password="password"

# delete existing files
rm -rf /path/to/*.csv

# copy files to local disk
cp /old/path/to/*.csv /path/to/

# define directory containing CSV files
_csv_directory="/path/to"

# go into directory
cd $_csv_directory || exit

# edit file name
rename "s/ //g" *.csv
rename "s/^/tp/g" *.csv

# get a list of CSV files in directory
_csv_files=`ls -1 *.csv`

# loop through csv files
for _csv_file in ${_csv_files[@]}
do

  # remove file extension
  _csv_file_extensionless=`echo "$_csv_file" | sed 's/\(.*\)\..*//'`

  # define table name
  _table_name="${_csv_file_extensionless}"

  # get header columns from CSV file
  _header_columns=`head -1 $_csv_directory/$_csv_file | tr ',' '\n' | sed 's/"//' | sed 's/ /_/g'`
  _header_columns_string=`head -1 $_csv_directory/$_csv_file | sed 's/ /_/g' | sed 's/"//g' | sed 's/(//g' | sed 's/)//g'`

  # ensure table exists
  mysql -u $_db_user -p$_db_password $_db << eof
    CREATE TABLE IF NOT EXISTS \`$_table_name\` ENGINE=MyISAM DEFAULT CHARSET=utf8
eof

  # loop through header columns
  for _header in "${_header_columns[@]}"
  do

    # add column
    mysql -u $_db_user -p$_db_password $_db --execute="alter table \`$_table_name\` add column IF NOT EXISTS \`$_header\` text"

  done

  # import csv into mysql
  mysqlimport --fields-enclosed-by='"' --fields-terminated-by=',' --lines-terminated-by="\r\n" --columns=$_header_columns_string -u $_db_user -p$_db_password $_db $_csv_directory/$_csv_file

done
exit

这是我为每个 .csv 获得的更新输出:

++ for _csv_file in ${_csv_files[@]}
+++ echo tpTriplexBC.csv
+++ sed 's/\(.*\)\..*//'
++ _csv_file_extensionless=tpTriplexBC
++ _table_name=tpTriplexBC
+++ head -1 /path/to/tpTriplexBC.csv
+++ tr , '\n'
+++ sed 's/^"//'
+++ sed 's/"$//'
+++ sed 's/ /_/g'
+++ sed 's/(//g'
+++ sed 's/)//g'
++ _header_columns='ProductName
ProductID
Quantity
Tax
Paper_Type
PriceExpress
Production_Days_Express
PriceStandard
Production_Days_Standard
PriceSaver
Production_Days_Saver
PriceSameday
Production_Days_Sameday
Price_Just_Print
Price_File_Check
Price_File_Check_with_Proofing
Price_File_Assist
Artwork_Tax
Size
Flat_Width_mm
Flat_Height_mm
Finished_Width_mm
Finished_Height_mm
productionDataJSON
'pdatedAt"
+++ head -1 /path/to/tpTriplexBC.csv
+++ sed 's/ /_/g'
+++ sed 's/"//g'
+++ sed 's/(//g'
+++ sed 's/)//g'
++ _header_columns_string=$'ProductName,ProductID,Quantity,Tax,Paper_Type,PriceExpress,Production_Days_Express,PriceStandard,Production_Days_Standard,PriceSaver,Production_Days_Saver,PriceSameday,Production_Days_Sameday,Price_Just_Print,Price_File_Check,Price_File_Check_with_Proofing,Price_File_Assist,Artwork_Tax,Size,Flat_Width_mm,Flat_Height_mm,Finished_Width_mm,Finished_Height_mm,productionDataJSON,updatedAt\r'
++ mysql -u user -ppassword dbname
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `ProductName` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `ProductID` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Quantity` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Tax` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Paper_Type` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceExpress` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Express` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceStandard` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Standard` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceSaver` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Saver` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `PriceSameday` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Production_Days_Sameday` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_Just_Print` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Check` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Check_with_Proofing` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Price_File_Assist` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Artwork_Tax` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Size` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Flat_Width_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Flat_Height_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Finished_Width_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `Finished_Height_mm` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexBC` ADD COLUMN IF NOT EXISTS `productionDataJSON` text'
++ for _header in ${_header_columns[@]}
++ mysql -u user -ppassword dbname '--execute=ALTER TABLE `tpTriplexB` text'COLUMN IF NOT EXISTS `updatedAt"
'RROR 1166 (42000) at line 1: Incorrect column name 'updatedAt"
++ mysqlimport '--fields-enclosed-by="' --fields-terminated-by=, '--lines-terminated-by=\n' $'--columns=ProductName,ProductID,Quantity,Tax,Paper_Type,PriceExpress,Production_Days_Express,PriceStandard,Production_Days_Standard,PriceSaver,Production_Days_Saver,PriceSameday,Production_Days_Sameday,Price_Just_Print,Price_File_Check,Price_File_Check_with_Proofing,Price_File_Assist,Artwork_Tax,Size,Flat_Width_mm,Flat_Height_mm,Finished_Width_mm,Finished_Height_mm,productionDataJSON,updatedAt\r' -u user -ppassword dbname /path/to/tpTriplexBC.csv
mysqlimport: Error: 1054, Unknown column 'updatedAt' in 'field list', when using table: tpTriplexBC
++ exit

每个列都插入了 table 但没有数据。还创建了 tables 称为 -1 和 ls,我不想要。

csv 的第一行是: "ProductName","ProductID","Quantity","Tax","Special Finish","Laminate","Sides Printed","Material","PriceStandard","Production Days Standard","PriceSaver","Production Days Saver","PriceSameday","Production Days Sameday","Price Just Print","Price File Check","Price File Check with Proofing","Price File Assist","Artwork Tax","Size","Flat Width (mm)","Flat Height (mm)","Finished Width (mm)","Finished Height (mm)","productionDataJSON","updatedAt"

header 的八进制转储输出:

0000000   "   P   r   o   d   u   c   t   N   a   m   e   "   ,   "   P
0000020   r   o   d   u   c   t   I   D   "   ,   "   Q   u   a   n   t
0000040   i   t   y   "   ,   "   T   a   x   "   ,   "   P   a   p   e
0000060   r       T   y   p   e   "   ,   "   P   r   i   c   e   E   x
0000100   p   r   e   s   s   "   ,   "   P   r   o   d   u   c   t   i
0000120   o   n       D   a   y   s       E   x   p   r   e   s   s   "
0000140   ,   "   P   r   i   c   e   S   t   a   n   d   a   r   d   "
0000160   ,   "   P   r   o   d   u   c   t   i   o   n       D   a   y
0000200   s       S   t   a   n   d   a   r   d   "   ,   "   P   r   i
0000220   c   e   S   a   v   e   r   "   ,   "   P   r   o   d   u   c
0000240   t   i   o   n       D   a   y   s       S   a   v   e   r   "
0000260   ,   "   P   r   i   c   e   S   a   m   e   d   a   y   "   ,
0000300   "   P   r   o   d   u   c   t   i   o   n       D   a   y   s
0000320       S   a   m   e   d   a   y   "   ,   "   P   r   i   c   e
0000340       J   u   s   t       P   r   i   n   t   "   ,   "   P   r
0000360   i   c   e       F   i   l   e       C   h   e   c   k   "   ,
0000400   "   P   r   i   c   e       F   i   l   e       C   h   e   c
0000420   k       w   i   t   h       P   r   o   o   f   i   n   g   "
0000440   ,   "   P   r   i   c   e       F   i   l   e       A   s   s
0000460   i   s   t   "   ,   "   A   r   t   w   o   r   k       T   a
0000500   x   "   ,   "   S   i   z   e   "   ,   "   F   l   a   t    
0000520   W   i   d   t   h       (   m   m   )   "   ,   "   F   l   a
0000540   t       H   e   i   g   h   t       (   m   m   )   "   ,   "
0000560   F   i   n   i   s   h   e   d       W   i   d   t   h       (
0000600   m   m   )   "   ,   "   F   i   n   i   s   h   e   d       H
0000620   e   i   g   h   t       (   m   m   )   "   ,   "   p   r   o
0000640   d   u   c   t   i   o   n   D   a   t   a   J   S   O   N   "
0000660   ,   "   u   p   d   a   t   e   d   A   t   "  \r  \n

查看 CSV header 的 od 输出,似乎在 CSV 行的末尾有回车符 return 和换行符(\r\n 顺序)。 CSV 是否来自 Windows 下的处理? \r\n 是 Windows 下常见的文本行终止符,而 Linux 仅使用 \n\r 作为字符串的一部分包含在内,不被视为行终止的一部分。它还会产生看起来很奇怪的输出,因为 \r 本身在输出时会将光标位置重置为第一列,并且后续字符显示在先前可能已显示在同一行上的任何内容的顶部。

我建议先 运行 dos2unix 在您的 CSV 文件上,然后再进行处理。我认为这将清除奇怪的错误并帮助文件的其余部分的处理也正常工作。