提取 XML R 中的 sqlQuery 问题 - 查询 Clob 列
Extract XML sqlQuery Issues in R - Querying Clob Column
我有一个名为 CRS.CRS_FILES 的 Oracle 数据库 table,其中有一个名为 FILE_DATA 的列 - 在该 CLOB 列中是一个大的 XML 字符串。
FILE_DATA FILE_CREATION_DATE
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/6/2019
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
这是它的前几行:
<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
设置了我要查询的Xpath:
//REPORT/AGENCYIDENTIFIER
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
我不确定我在做什么 - 我知道 sqlQuery 在传递 SQL 查询时存在一些小的格式问题,但无论我尝试什么,我的结果如下所示:
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
我做错了什么?我只想提取值 Milwaukee Police Department(见下文)(当然我会将 col 重命名为类似 AGENCYNAME 的名称)
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 Milwaukee Police Department
2 Milwaukee Police Department
3 Milwaukee Police Department
4 Milwaukee Police Department
5 Milwaukee Police Department
6 Milwaukee Police Department
7 Milwaukee Police Department
8 Milwaukee Police Department
9 Milwaukee Police Department
10 Milwaukee Police Department
EXTRACT(xml)
function 已弃用。相反,使用 XMLTABLE
:
SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS "i",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x
或者,在 R 中它应该与转义的双引号相同:
query_string2 <- "SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS \"i\",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x"
idtable <- sqlQuery(ch,query_string2, max=10)
其中,对于你的测试数据:
CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB );
INSERT INTO CRS.crs_files VALUES (
'<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
</REPORT>'
)
输出:
| AGENCYNAME |
| :-------------------------- |
| Milwaukee Police Department |
如果您确实想使用 EXTRACT
,那么您需要指定 XML 命名空间:
SELECT XMLTYPE(t.FILE_DATA).EXTRACT(
'//REPORT/AGENCYNAME/text()',
'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"'
).getStringVal() AS agencyname
FROM CRS.CRS_FILES t
输出:
| AGENCYNAME |
| :-------------------------- |
| Milwaukee Police Department |
db<>fiddle here
当前的 Oracle 查询是问题而不是 RODBC::sqlQuery
方法。简单地说,您的 XPath 不考虑根节点中的默认命名空间。但是,XMLType extract()
函数允许您定义一个临时前缀以便在 XPath 中使用:
extract(XMLType_instance IN XMLType,
XPath_string IN VARCHAR2,
namespace_string In VARCHAR2 := NULL) RETURN XMLType;
因此,一旦像 doc
那样定义了前缀,就将其应用于 XPath:
query_string2 <- "SELECT XMLTYPE(t.FILE_DATA).EXTRACT('//doc:REPORT/doc:AGENCYNAME/text()',
'xmlns:doc=\"http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201\"').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
Online Demo (适用于 getClobVal
和 getStringVal
)
我有一个名为 CRS.CRS_FILES 的 Oracle 数据库 table,其中有一个名为 FILE_DATA 的列 - 在该 CLOB 列中是一个大的 XML 字符串。
FILE_DATA FILE_CREATION_DATE
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/6/2019
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
这是它的前几行:
<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
设置了我要查询的Xpath:
//REPORT/AGENCYIDENTIFIER
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
我不确定我在做什么 - 我知道 sqlQuery 在传递 SQL 查询时存在一些小的格式问题,但无论我尝试什么,我的结果如下所示:
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
我做错了什么?我只想提取值 Milwaukee Police Department(见下文)(当然我会将 col 重命名为类似 AGENCYNAME 的名称)
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 Milwaukee Police Department
2 Milwaukee Police Department
3 Milwaukee Police Department
4 Milwaukee Police Department
5 Milwaukee Police Department
6 Milwaukee Police Department
7 Milwaukee Police Department
8 Milwaukee Police Department
9 Milwaukee Police Department
10 Milwaukee Police Department
EXTRACT(xml)
function 已弃用。相反,使用 XMLTABLE
:
SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS "i",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x
或者,在 R 中它应该与转义的双引号相同:
query_string2 <- "SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS \"i\",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x"
idtable <- sqlQuery(ch,query_string2, max=10)
其中,对于你的测试数据:
CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB );
INSERT INTO CRS.crs_files VALUES (
'<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
</REPORT>'
)
输出:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
如果您确实想使用 EXTRACT
,那么您需要指定 XML 命名空间:
SELECT XMLTYPE(t.FILE_DATA).EXTRACT(
'//REPORT/AGENCYNAME/text()',
'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"'
).getStringVal() AS agencyname
FROM CRS.CRS_FILES t
输出:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
db<>fiddle here
当前的 Oracle 查询是问题而不是 RODBC::sqlQuery
方法。简单地说,您的 XPath 不考虑根节点中的默认命名空间。但是,XMLType extract()
函数允许您定义一个临时前缀以便在 XPath 中使用:
extract(XMLType_instance IN XMLType,
XPath_string IN VARCHAR2,
namespace_string In VARCHAR2 := NULL) RETURN XMLType;
因此,一旦像 doc
那样定义了前缀,就将其应用于 XPath:
query_string2 <- "SELECT XMLTYPE(t.FILE_DATA).EXTRACT('//doc:REPORT/doc:AGENCYNAME/text()',
'xmlns:doc=\"http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201\"').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
Online Demo (适用于 getClobVal
和 getStringVal
)