如何从 csv 文件创建外部 table,在 greenplum 的引号字段中使用逗号?
how to create external table from csv file with commas in quote field in greenplum?
我正在尝试从 csv 创建外部 table,如下所示:
CREATE EXTERNAL TABLE hctest.ex_nkp
(
a text,
b text,
c text,
d text,
e text,
f text,
g text,
h text
)
LOCATION ('gpfdist://192.168.56.111:10000/performnkp.csv')
FORMAT 'CSV' (DELIMITER ',' HEADER);
csv 由逗号 (,) 分隔,如下所示:
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers'","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
我发现错误:
ERROR: extra data after last expected column (seg0 slice1 192.168.56.111:6000 pid=14121)
DETAIL: External table ex_nkp, line 5 of file gpfdist://192.168.56.111:10000/performnkp.csv
我该如何解决这个问题?
您的 CSV 在第 4 行中的格式似乎有误。请注意,在第 4 行的末尾有一个单引号,Greenplum 将其解释为带有换行符的 CSV 字段。通过在第 4 行添加缺失的引号,我可以在 Greenplum 中读取文件。
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers'","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
结果查询:
fguerrero=# select * from ex_nkp ;
NOTICE: HEADER means that each one of the data files has a header row
a | b | c | d | e | f | g | h
----------+-------------------------------------------------------+------------+------------+-----------------------------+------------------------------------------------------------------------------------+-----+---
90008765 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 1. Uncompromising Integrity | <p>High ethical standards, low tolerance of unethical conduct.</p> | Yes | 3
90008766 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 2. Team Synergy | <p>Passionately work together, ensuring completeness, to achieve common goals.</p> | Yes | 3
90008767 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 3. Simplicity | <p>We do our utmost to deliver the easy to use solutions, exceeding customers' | |
90008768 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 4. Exceptional Performance | <p>Highest level of performance, with a heart for people.</p> | Yes | 3
(4 rows)
如果有帮助请告诉我
您可以在外部 table 定义中指定 "LOG ERRORS SEGMENT REJECT LIMIT 10"。这样,segment 将跳过有错误的行。
然后你可以回来追踪细节 "select * from gp_read_error_log('external_table_name');"
从这个例子来看,您似乎在该字段中有多余的逗号。尝试在 HEADER 后指定 QUOTE '"'。
我正在尝试从 csv 创建外部 table,如下所示:
CREATE EXTERNAL TABLE hctest.ex_nkp
(
a text,
b text,
c text,
d text,
e text,
f text,
g text,
h text
)
LOCATION ('gpfdist://192.168.56.111:10000/performnkp.csv')
FORMAT 'CSV' (DELIMITER ',' HEADER);
csv 由逗号 (,) 分隔,如下所示:
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers'","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
我发现错误:
ERROR: extra data after last expected column (seg0 slice1 192.168.56.111:6000 pid=14121)
DETAIL: External table ex_nkp, line 5 of file gpfdist://192.168.56.111:10000/performnkp.csv
我该如何解决这个问题?
您的 CSV 在第 4 行中的格式似乎有误。请注意,在第 4 行的末尾有一个单引号,Greenplum 将其解释为带有换行符的 CSV 字段。通过在第 4 行添加缺失的引号,我可以在 Greenplum 中读取文件。
"Subject Username","Form Title","Form Start Date","Form End Date","Competency Name","Competency Description","Core Competency","Competency Official Rating"
"90008765","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","1. Uncompromising Integrity","<p>High ethical standards, low tolerance of unethical conduct.</p>","Yes","3"
"90008766","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","2. Team Synergy","<p>Passionately work together, ensuring completeness, to achieve common goals.</p>","Yes","3"
"90008767","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","3. Simplicity","<p>We do our utmost to deliver the easy to use solutions, exceeding customers'","","
"90008768","Performance Review - 2nd Semester 2019 for Ely Eisley","01/01/2019","31/12/2019","4. Exceptional Performance","<p>Highest level of performance, with a heart for people.</p>","Yes","3"
结果查询:
fguerrero=# select * from ex_nkp ;
NOTICE: HEADER means that each one of the data files has a header row
a | b | c | d | e | f | g | h
----------+-------------------------------------------------------+------------+------------+-----------------------------+------------------------------------------------------------------------------------+-----+---
90008765 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 1. Uncompromising Integrity | <p>High ethical standards, low tolerance of unethical conduct.</p> | Yes | 3
90008766 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 2. Team Synergy | <p>Passionately work together, ensuring completeness, to achieve common goals.</p> | Yes | 3
90008767 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 3. Simplicity | <p>We do our utmost to deliver the easy to use solutions, exceeding customers' | |
90008768 | Performance Review - 2nd Semester 2019 for Ely Eisley | 01/01/2019 | 31/12/2019 | 4. Exceptional Performance | <p>Highest level of performance, with a heart for people.</p> | Yes | 3
(4 rows)
如果有帮助请告诉我
您可以在外部 table 定义中指定 "LOG ERRORS SEGMENT REJECT LIMIT 10"。这样,segment 将跳过有错误的行。 然后你可以回来追踪细节 "select * from gp_read_error_log('external_table_name');" 从这个例子来看,您似乎在该字段中有多余的逗号。尝试在 HEADER 后指定 QUOTE '"'。