u-sql:过滤掉空//空字符串(微软学术图)
u-sql: filtering out empty// Null strings (microsoft academic graph)
我是 Azure 数据湖分析的新手 sql。
我想做一个我认为很简单的操作却运行陷入了麻烦。
基本上:我想创建一个忽略空字符串的查询。
在 select 中使用它可以,但不能在 WHERE 语句中使用。
在我所做的陈述和我得到的神秘错误下面
工作
@xsel_res_1 =
EXTRACT
x_paper_id long,
x_Rank uint,
x_doi string,
x_doc_type string,
x_paper_title string,
x_original_title string,
x_book_title string,
x_paper_year int,
x_paper_date DateTime?,
x_publisher string,
x_journal_id long?,
x_conference_series_id long?,
x_conference_instance_id long?,
x_volume string,
x_issue string,
x_first_page string,
x_last_page string,
x_reference_count long,
x_citation_count long?,
x_estimated_citation int?
FROM @"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
;
@xsel_res_2 =
SELECT
x_paper_id AS x_paper_id,
x_doi.ToLower() AS x_doi,
x_doi.Length AS x_doi_length
FROM @xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;
@xsel_res_3 =
SELECT
*
FROM @xsel_res_2
SAMPLE ANY (5)
;
OUTPUT @xsel_res_3
TO @"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();
错误
Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1] with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error
E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.
E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD: Invalid character following the ending quote character in a quoted field.
Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution
Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.
Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details
Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72
更新
顺便说一句,这些操作适用于其他数据集,所以据我所知,问题不在于语法
//Define schema of file, must map all columns
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@searchlog_1 =
SELECT * FROM @searchlog
WHERE NOT string.IsNullOrEmpty(ClickedUrls );
OUTPUT @searchlog_1
TO @"/Samples/Output/SearchLog_output_x1.tsv"
USING Outputters.Tsv();
对于这种情况,这是一个不幸的错误显示。
假设文本是 utf-8,您可以使用像 www.hexutf8.com 这样的网站将十六进制转换为:
avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr
输入行似乎包含至少一个未正确转义的 "
字符。它应该是这样的:
avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr
@Saveenr 的回答假设您文件中的值都被引用了。或者,如果它们未被引用(并且不包含您的列分隔符作为值),则设置 Extractors.Tsv(quoting:false)
也可能有所帮助。
我是 Azure 数据湖分析的新手 sql。 我想做一个我认为很简单的操作却运行陷入了麻烦。 基本上:我想创建一个忽略空字符串的查询。 在 select 中使用它可以,但不能在 WHERE 语句中使用。
在我所做的陈述和我得到的神秘错误下面
工作
@xsel_res_1 =
EXTRACT
x_paper_id long,
x_Rank uint,
x_doi string,
x_doc_type string,
x_paper_title string,
x_original_title string,
x_book_title string,
x_paper_year int,
x_paper_date DateTime?,
x_publisher string,
x_journal_id long?,
x_conference_series_id long?,
x_conference_instance_id long?,
x_volume string,
x_issue string,
x_first_page string,
x_last_page string,
x_reference_count long,
x_citation_count long?,
x_estimated_citation int?
FROM @"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
;
@xsel_res_2 =
SELECT
x_paper_id AS x_paper_id,
x_doi.ToLower() AS x_doi,
x_doi.Length AS x_doi_length
FROM @xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;
@xsel_res_3 =
SELECT
*
FROM @xsel_res_2
SAMPLE ANY (5)
;
OUTPUT @xsel_res_3
TO @"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();
错误
Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1] with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error
E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.
E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD: Invalid character following the ending quote character in a quoted field.
Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution
Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.
Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details
Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72
更新 顺便说一句,这些操作适用于其他数据集,所以据我所知,问题不在于语法
//Define schema of file, must map all columns
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@searchlog_1 =
SELECT * FROM @searchlog
WHERE NOT string.IsNullOrEmpty(ClickedUrls );
OUTPUT @searchlog_1
TO @"/Samples/Output/SearchLog_output_x1.tsv"
USING Outputters.Tsv();
对于这种情况,这是一个不幸的错误显示。
假设文本是 utf-8,您可以使用像 www.hexutf8.com 这样的网站将十六进制转换为:
avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr
输入行似乎包含至少一个未正确转义的 "
字符。它应该是这样的:
avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr
@Saveenr 的回答假设您文件中的值都被引用了。或者,如果它们未被引用(并且不包含您的列分隔符作为值),则设置 Extractors.Tsv(quoting:false)
也可能有所帮助。