如何使用 AWK 将键值对列表转换为 table 列?
How do I convert key value paired list into table with columns using AWK?
我需要将数据集从键值对列表(informix dbaccess 输出)转换为列式 csv。我相当确定这可以使用 awk 或 sed 轻松完成。
UPDATE 解决方案需要是单行响应。我正在使用 NSH(基于 ZSH)。所以一些典型的 "bashy" 命令将不起作用。
这是我的数据样本集:
part_no 100000001
date_part 2010-10-13 12:12:12
history_code ABCD
user_id rsmith
other_information note: Monday, December 10
pool_no 101011777
part_no 100000002
date_part 2010-10-21 12:12:12
history_code GHIJ
user_id jsmith
other_information
pool_no 101011888
part_no 100000002
date_part 2010-10-27 12:12:12
history_code LMNO
user_id fevers
other_information [Mail]
pool_no 101011999
part_no 100000003
date_part 2010-11-13 12:12:12
history_code QXRT
user_id sjohnson
other_information note: Tuesday, August 31
pool_no 101011111
我需要它看起来像这样:
part_no,date_part,history_code,user_id,other_information,pool_no
100000001,10/13/2010 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,10/21/2010 12:12:12,GHIJ,jsmith,,101011888
100000002,10/27/2010 12:12:12,LMNO,fevers,[Mail],101011999
100000003,11/13/2010 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111
试试这个:
cat $file | cut -d ' ' -f 2- | sed 's/^[ \t]*//' | sed 's/$/,/' \
| xargs | sed 's/ , /\n/g' | sed 's/.$//' | sed 's/, /,/g' \
| sed '1ipart_no,date_part,history_code,user_id,other_information,pool_no'
您的问题不清楚,但这可能是您要查找的内容:
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }
{
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/[[:space:]].*/,"",tag)
sub(/[^[:space:]]+[[:space:]]+/,"",val)
tags[i] = tag
vals[i] = val
}
}
NR==1 {
for (i=1; i<=NF; i++) {
printf ofmt, tags[i], (i<NF ? OFS : ORS)
}
}
{
for (i=1; i<=NF; i++) {
printf ofmt, vals[i], (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"
能否请您尝试关注,如果对您有帮助,请告诉我。
awk -v s1="," '/part_no/ && value{if(header){print header;flag=1;header=""};print value;value=""} NF{if(!flag){header=(header?header s1 "":"")};sub(/^[^[:space:]]+[[:space:]]+/,"");value=value?value s1 [=10=]:[=10=]} END{if(value){print value}}' Input_file
输出如下。
part_no,date_part,history_code,user_id,other_information,pool_no
100000001,2010-10-13 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,2010-10-21 12:12:12,GHIJ,jsmith,,101011888
100000002,2010-10-27 12:12:12,LMNO,fevers,[Mail],101011999
100000003,2010-11-13 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111
现在也添加了一种非线性形式的解决方案。
awk -v s1="," '
/part_no/ && value{
if(header){
print header;
flag=1;
header=""}
print value;
value=""
}
NF{
if(!flag){
header=(header?header s1 "":"")}
sub(/^[^[:space:]]+[[:space:]]+/,"")
value=value?value s1 [=12=]:[=12=]
}
END{
if(value){
print value}
}' Input_file
我知道 OP 说 awk 但 bash 只是坐在那里。
#
# line to be printed
line=""
#
# first value on a line flag
first=""
#
# read the file
while read key val; do
#
# if key is empty then the input line is empty.
if [ "$key" = "" ] ; then
#
# skip leading blank lines in the file
if [ "$line" = "" ] ; then
continue
else
#
# print and reset the line
echo $line
line=""
first=""
fi
else
#
# place the first comma after the first value
if [ "$first" = "" ] ; then
line="\"$val\""
first="1"
else
line="$line,\"$val\""
fi
fi
done < file.txt
#
# print the last line, if there is one
if [ "$line" != "" ] ; then
echo $line
fi
我将此作为 Informix 问题而不是 Awk 问题来处理。
使用标准的 Informix SQL 命令,您也可以创建 CSV 格式的 external table — 但您必须知道有一种未记录的格式 "DB2"
您可以使用:
DROP TABLE IF EXISTS data_table;
CREATE TABLE data_table
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
);
INSERT INTO data_table VALUES(100000001, "2010-10-13 12:12:12", "ABCD", "rsmith", "note: Monday, December 10", 101011777);
INSERT INTO data_table VALUES(100000002, "2010-10-21 12:12:12", "GHIJ", "jsmith", NULL, 101011888);
INSERT INTO data_table VALUES(100000002, "2010-10-27 12:12:12", "LMNO", "fevers", "[Mail]", 101011999);
INSERT INTO data_table VALUES(100000003, "2010-11-13 12:12:12", "QXRT", "sjohnson", "note: Tuesday, August 31", 101011111);
DROP TABLE IF EXISTS csv_data;
CREATE EXTERNAL TABLE csv_data
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
)
USING (FORMAT "DB2", DELIMITER ",", DATAFILES("DISK:/tmp/data/csv_data.csv"));
INSERT INTO csv_data
SELECT part_no, date_part, history_code, user_id, other_information, pool_no
FROM data_table;
/tmp/data/csv_data.csv
的内容看起来像:
100000001,2010-10-13 12:12:12,"ABCD","rsmith","note: Monday, December 10",101011777
100000002,2010-10-21 12:12:12,"GHIJ","jsmith",,101011888
100000002,2010-10-27 12:12:12,"LMNO","fevers","[Mail]",101011999
100000003,2010-11-13 12:12:12,"QXRT","sjohnson","note: Tuesday, August 31",101011111
UNLOAD 格式转换为 CSV
DB-Access 的默认输出在实践中不易解析。
它在某些有限的情况下可能是可行的,例如您展示的那个,但您最好使用 UNLOAD 格式而不是命令行输出,然后将 UNLOAD 数据格式转换为 CSV。
我有一个执行此操作的 Perl 脚本。它使用 Perl Text::CSV 模块来处理 CSV 格式。它不会假装处理带有列名的第一行;那些不存在于 UNLOAD 格式文件中。
#!/usr/bin/env perl
#
# @(#)$Id: unl2csv.pl,v 1.3 2018/06/29 20:36:58 jleffler Exp $
#
# Convert Informix UNLOAD format to CSV
use strict;
use warnings;
use Text::CSV;
use IO::Wrap;
my $csv = new Text::CSV({ binary => 1 }) or die "Failed to create CSV handle ($!)";
my $dlm = defined $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
my $out = wraphandle(\*STDOUT);
my $rgx = qr/((?:[^$dlm]|(?:\.))*)$dlm/sm;
# $csv->eol("\r\n");
while (my $line = <>)
{
print "1: $line";
MultiLine:
while ($line eq "\\n" || $line =~ m/[^\](?:\\)*\$/)
{
my $extra = <>;
last MultiLine unless defined $extra;
$line .= $extra;
}
my @fields = split_unload($line);
$csv->print($out, \@fields);
}
sub split_unload
{
my($line) = @_;
my @fields;
print "$line";
while ($line =~ $rgx)
{
printf "%d: %s\n", scalar(@fields), ;
push @fields, ;
}
return @fields;
}
__END__
=head1 NAME
unl2csv - Convert Informix UNLOAD to CSV format
=head1 SYNOPSIS
unl2csv [file ...]
=head1 DESCRIPTION
The unl2csv program converts a file from Informix UNLOAD file format to
the corresponding CSV (comma separated values) format.
The input delimiter is determined by the environment variable
DBDELIMITER, and defaults to the pipe symbol "|".
It is not assumed that each input line is terminated with a delimiter
(there are two variants of the UNLOAD format, one with and one without
the final delimiter).
=head1 EXAMPLES
Input:
10|12|excessive|cost \|of, living|
20|40|bou\ncing tigger|grrrrrrrr|
Output:
10,12,"excessive","cost |of, living"
20,40,"bou\ncing tigger",grrrrrrrr
=head1 PRE-REQUISITES
Text::CSV_XS
=head1 AUTHOR
Jonathan Leffler <jonathan.leffler@hcl.com>
=cut
您将使用这样的命令(通过 DB-Access):
UNLOAD TO "datatable.unl" SELECT * FROM DataTable;
然后 运行:
perl unl2csv datatable.unl > datatable.csv
SQLCMD程序
如果您有我的 SQLCMD 程序(可从 IIUG 网站的软件存储库中获得 — 与 Microsoft 的同名 johnny-come-lately 完全无关),那么您可以直接卸载为CSV格式:
sqlcmd -d database -F csv -e 'unload to "data_table.csv" select * from data_table'
我需要将数据集从键值对列表(informix dbaccess 输出)转换为列式 csv。我相当确定这可以使用 awk 或 sed 轻松完成。
UPDATE 解决方案需要是单行响应。我正在使用 NSH(基于 ZSH)。所以一些典型的 "bashy" 命令将不起作用。
这是我的数据样本集:
part_no 100000001
date_part 2010-10-13 12:12:12
history_code ABCD
user_id rsmith
other_information note: Monday, December 10
pool_no 101011777
part_no 100000002
date_part 2010-10-21 12:12:12
history_code GHIJ
user_id jsmith
other_information
pool_no 101011888
part_no 100000002
date_part 2010-10-27 12:12:12
history_code LMNO
user_id fevers
other_information [Mail]
pool_no 101011999
part_no 100000003
date_part 2010-11-13 12:12:12
history_code QXRT
user_id sjohnson
other_information note: Tuesday, August 31
pool_no 101011111
我需要它看起来像这样:
part_no,date_part,history_code,user_id,other_information,pool_no
100000001,10/13/2010 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,10/21/2010 12:12:12,GHIJ,jsmith,,101011888
100000002,10/27/2010 12:12:12,LMNO,fevers,[Mail],101011999
100000003,11/13/2010 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111
试试这个:
cat $file | cut -d ' ' -f 2- | sed 's/^[ \t]*//' | sed 's/$/,/' \
| xargs | sed 's/ , /\n/g' | sed 's/.$//' | sed 's/, /,/g' \
| sed '1ipart_no,date_part,history_code,user_id,other_information,pool_no'
您的问题不清楚,但这可能是您要查找的内容:
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; ofmt="\"%s\"%s" }
{
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/[[:space:]].*/,"",tag)
sub(/[^[:space:]]+[[:space:]]+/,"",val)
tags[i] = tag
vals[i] = val
}
}
NR==1 {
for (i=1; i<=NF; i++) {
printf ofmt, tags[i], (i<NF ? OFS : ORS)
}
}
{
for (i=1; i<=NF; i++) {
printf ofmt, vals[i], (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
"part_no","date_part","history_code","user_id","other_information","pool_no"
"100000001","2010-10-13 12:12:12","ABCD","rsmith","note: Monday, December 10","101011777"
"100000002","2010-10-21 12:12:12","GHIJ","jsmith","other_information","101011888"
"100000002","2010-10-27 12:12:12","LMNO","fevers","[Mail]","101011999"
"100000003","2010-11-13 12:12:12","QXRT","sjohnson","note: Tuesday, August 31","101011111"
能否请您尝试关注,如果对您有帮助,请告诉我。
awk -v s1="," '/part_no/ && value{if(header){print header;flag=1;header=""};print value;value=""} NF{if(!flag){header=(header?header s1 "":"")};sub(/^[^[:space:]]+[[:space:]]+/,"");value=value?value s1 [=10=]:[=10=]} END{if(value){print value}}' Input_file
输出如下。
part_no,date_part,history_code,user_id,other_information,pool_no
100000001,2010-10-13 12:12:12,ABCD,rsmith,note: Monday, December 10,101011777
100000002,2010-10-21 12:12:12,GHIJ,jsmith,,101011888
100000002,2010-10-27 12:12:12,LMNO,fevers,[Mail],101011999
100000003,2010-11-13 12:12:12,QXRT,sjohnson,note: Tuesday, August 31,101011111
现在也添加了一种非线性形式的解决方案。
awk -v s1="," '
/part_no/ && value{
if(header){
print header;
flag=1;
header=""}
print value;
value=""
}
NF{
if(!flag){
header=(header?header s1 "":"")}
sub(/^[^[:space:]]+[[:space:]]+/,"")
value=value?value s1 [=12=]:[=12=]
}
END{
if(value){
print value}
}' Input_file
我知道 OP 说 awk 但 bash 只是坐在那里。
#
# line to be printed
line=""
#
# first value on a line flag
first=""
#
# read the file
while read key val; do
#
# if key is empty then the input line is empty.
if [ "$key" = "" ] ; then
#
# skip leading blank lines in the file
if [ "$line" = "" ] ; then
continue
else
#
# print and reset the line
echo $line
line=""
first=""
fi
else
#
# place the first comma after the first value
if [ "$first" = "" ] ; then
line="\"$val\""
first="1"
else
line="$line,\"$val\""
fi
fi
done < file.txt
#
# print the last line, if there is one
if [ "$line" != "" ] ; then
echo $line
fi
我将此作为 Informix 问题而不是 Awk 问题来处理。
使用标准的 Informix SQL 命令,您也可以创建 CSV 格式的 external table — 但您必须知道有一种未记录的格式 "DB2"
您可以使用:
DROP TABLE IF EXISTS data_table;
CREATE TABLE data_table
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
);
INSERT INTO data_table VALUES(100000001, "2010-10-13 12:12:12", "ABCD", "rsmith", "note: Monday, December 10", 101011777);
INSERT INTO data_table VALUES(100000002, "2010-10-21 12:12:12", "GHIJ", "jsmith", NULL, 101011888);
INSERT INTO data_table VALUES(100000002, "2010-10-27 12:12:12", "LMNO", "fevers", "[Mail]", 101011999);
INSERT INTO data_table VALUES(100000003, "2010-11-13 12:12:12", "QXRT", "sjohnson", "note: Tuesday, August 31", 101011111);
DROP TABLE IF EXISTS csv_data;
CREATE EXTERNAL TABLE csv_data
(
part_no INTEGER,
date_part DATETIME YEAR TO SECOND,
history_code VARCHAR(4),
user_id VARCHAR(32),
other_information VARCHAR(64),
pool_no INTEGER
)
USING (FORMAT "DB2", DELIMITER ",", DATAFILES("DISK:/tmp/data/csv_data.csv"));
INSERT INTO csv_data
SELECT part_no, date_part, history_code, user_id, other_information, pool_no
FROM data_table;
/tmp/data/csv_data.csv
的内容看起来像:
100000001,2010-10-13 12:12:12,"ABCD","rsmith","note: Monday, December 10",101011777
100000002,2010-10-21 12:12:12,"GHIJ","jsmith",,101011888
100000002,2010-10-27 12:12:12,"LMNO","fevers","[Mail]",101011999
100000003,2010-11-13 12:12:12,"QXRT","sjohnson","note: Tuesday, August 31",101011111
UNLOAD 格式转换为 CSV
DB-Access 的默认输出在实践中不易解析。 它在某些有限的情况下可能是可行的,例如您展示的那个,但您最好使用 UNLOAD 格式而不是命令行输出,然后将 UNLOAD 数据格式转换为 CSV。
我有一个执行此操作的 Perl 脚本。它使用 Perl Text::CSV 模块来处理 CSV 格式。它不会假装处理带有列名的第一行;那些不存在于 UNLOAD 格式文件中。
#!/usr/bin/env perl
#
# @(#)$Id: unl2csv.pl,v 1.3 2018/06/29 20:36:58 jleffler Exp $
#
# Convert Informix UNLOAD format to CSV
use strict;
use warnings;
use Text::CSV;
use IO::Wrap;
my $csv = new Text::CSV({ binary => 1 }) or die "Failed to create CSV handle ($!)";
my $dlm = defined $ENV{DBDELIMITER} ? $ENV{DBDELIMITER} : "|";
my $out = wraphandle(\*STDOUT);
my $rgx = qr/((?:[^$dlm]|(?:\.))*)$dlm/sm;
# $csv->eol("\r\n");
while (my $line = <>)
{
print "1: $line";
MultiLine:
while ($line eq "\\n" || $line =~ m/[^\](?:\\)*\$/)
{
my $extra = <>;
last MultiLine unless defined $extra;
$line .= $extra;
}
my @fields = split_unload($line);
$csv->print($out, \@fields);
}
sub split_unload
{
my($line) = @_;
my @fields;
print "$line";
while ($line =~ $rgx)
{
printf "%d: %s\n", scalar(@fields), ;
push @fields, ;
}
return @fields;
}
__END__
=head1 NAME
unl2csv - Convert Informix UNLOAD to CSV format
=head1 SYNOPSIS
unl2csv [file ...]
=head1 DESCRIPTION
The unl2csv program converts a file from Informix UNLOAD file format to
the corresponding CSV (comma separated values) format.
The input delimiter is determined by the environment variable
DBDELIMITER, and defaults to the pipe symbol "|".
It is not assumed that each input line is terminated with a delimiter
(there are two variants of the UNLOAD format, one with and one without
the final delimiter).
=head1 EXAMPLES
Input:
10|12|excessive|cost \|of, living|
20|40|bou\ncing tigger|grrrrrrrr|
Output:
10,12,"excessive","cost |of, living"
20,40,"bou\ncing tigger",grrrrrrrr
=head1 PRE-REQUISITES
Text::CSV_XS
=head1 AUTHOR
Jonathan Leffler <jonathan.leffler@hcl.com>
=cut
您将使用这样的命令(通过 DB-Access):
UNLOAD TO "datatable.unl" SELECT * FROM DataTable;
然后 运行:
perl unl2csv datatable.unl > datatable.csv
SQLCMD程序
如果您有我的 SQLCMD 程序(可从 IIUG 网站的软件存储库中获得 — 与 Microsoft 的同名 johnny-come-lately 完全无关),那么您可以直接卸载为CSV格式:
sqlcmd -d database -F csv -e 'unload to "data_table.csv" select * from data_table'