合并 CSV 文件并通过 AIX awk、sed、ksh 将一些列连接成单个列

Combine CSV files and concatenate some columns into single column via AIX awk, sed, ksh

我有多个这样的文件:

File_1.csv:
"Job Id", "Batch Id","Id","Success","Created","Error","Col1","Col2","Col3"
 aaabbb111,xxxyyy999,"false","false","Horrible_Error: Really Bad Error occured: yeah", "Val1", "Val2", "Val3"
 cccddd222,pppqqq888,"","false","Horrible_Error: Anoter Bad Error occured: ouch", "Val1", "Val2", "Val3"

File_2.csv:
"Job Id", "Batch Id","Id","Success","Created","Error","Col1","Col2","Col3","Col4", "Col5"
 aaabbb111,xxxyyy999,"false","false","Horrible_Error: Really Bad Error occured: oops","Val1","Val2","Val3","Val4","Val5"
 cccddd222,pppqqq888,"","false","Horrible_Error: Anoter Bad Error occured: oh-no", "Val1","Val1","Val2","Val3","Val4","Val5"

每个文件中的前 6 列始终具有相同的名称。剩余列的名称和数量各不相同,我想将它们捕获为单列,用 double-quotes、方括号或大括号或任何表示这是相同数据的东西包围。

我需要能够将这些文件组合成一个看起来像这样的文件。 header 是可选的,仅供参考:

"File_Name"|"Job Id"|"Batch Id"|"Id"|"Success"|"Created"|"Error"|"Tran_Header"|"Tran_Record" 
File_1.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: yeah"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_1.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: ouch"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_2.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: oops"|["Col1","Col2","Col3","Col4", "Col5"]|["Val1","Val2","Val3","Val4","Val5"]
File_2.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: oh-no"|["Col1","Col2","Col3","Col4", "Col5"]|["Val1","Val1","Val2","Val3","Val4","Val5"]

我尝试了以下方法来合并文件,但这段代码有时会在替换 double-quotes 时阻塞,然后我的 ETL 工具在解析连接的列集时又阻塞(而且我不知道如何捕获将 header 放入单独的列中):

outdirectory=/some/directory
outfilename=some_file_name.csv
for i in *.csv
do
    filename=$(echo "${i}")

    tail +2 "${i}" | sed -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e s/\"//g -e "s/^/#${filename}/" -e s/$/#/ | sed s/#/\"/g >> "${outdirectory}/${outfilename}" 

    mv $i $srcdir/
done

非常感谢任何帮助或想法。我是 UNIX shell 脚本的新手。差点忘了,我在 AIX v6.2

使用 awk 的解决方案(我使用 gnu-awk)

awk 'BEGIN{FS=",";OFS="|"}
{
  if(FNR==1){
    if(NR==1){
      print "\"File_Name\"",,,,,,,"\"Tran_Header\"","\"Tran_Record\"";
    }
    ======"";
    gsub("[|]+",",",[=10=]);
    gsub("^,","",[=10=]);
    titleCol = [=10=];
  }else{
    temp = FILENAME OFS  OFS  OFS  OFS  OFS  OFS "["titleCol"]";
    ====="";
    gsub("[|]+",",",[=10=]);
    gsub("^,","",[=10=]);
    print temp OFS "["[=10=]"]";
  }
}' *.csv

你得到:

"File_Name"|"Job Id"|"Batch Id"|"Id"|"Success"|"Created"|"Error"|"Tran_Header"|"Tran_Record"
File_1.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: yeah"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_1.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: ouch"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_2.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: oops"|["Col1","Col2","Col3","Col4","Col5"]|["Val1","Val2","Val3","Val4","Val5"]
File_2.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: oh-no"|["Col1","Col2","Col3","Col4","Col5"]|["Val1","Val1","Val2","Val3","Val4","Val5"]