尝试获取 ATOM 提要并解析出以 XSLT 格式的 XHTML 编写的部分

Trying to take an ATOM feed and parse out a section written in XHTML in XSLT format

我正在尝试使用 NOAA RSS 提要(NOAA 网站说它使用 ATOM 和 CAPS)并使用 XSLT 将其转换为 SharePoint。我对此很陌生,在 XSLT 方面的工作经验有限。这是 Feed 的示例。

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
xmlns:georss="http://www.georss.org/georss">
<id>urn:uuid:9ae4ae29-830f-4870-bace-0f70984b76bd</id><title>        
TSUNAMI INFORMATION STATEMENT NUMBER   1        </title>
<updated>2022-01-29T03:00:32Z</updated>
<author>
  <name>NWS PACIFIC TSUNAMI WARNING CENTER HONOLULU HI</name>
 <uri>http://ntwc.arh.noaa.gov/</uri>
 <email>ntwc@noaa.gov</email>
 </author>
 <icon>http://ntwc.arh.noaa.gov/images/favicon.ico</icon>
 <link type="application/atom+xml" rel="self" title="self" 
 href="http://ntwc.arh.noaa.gov/events/xml/PAAQAtom.xml"/>
 <link rel="related" title="Energy Map"  
 <entry>
 <title>KERMADEC ISLANDS REGION</title><updated>2022-01-29T03:00:32Z</updated>
 <geo:lat>-29.751</geo:lat>
 <geo:long>-174.709</geo:long>
 <summary type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
    <strong>Category:</strong> Information<br/>
    <strong>Bulletin Issue Time: </strong> 2022.01.29 03:00:32 UTC 
    <br/><strong>Preliminary Magnitude: </strong>6.6(Mwp)<br/> 
    <strong>Lat/Lon: </strong>-29.751 / -174.709<br/>
    <strong>Affected Region: </strong>KERMADEC ISLANDS REGION<br/>
</div>
</summary>
</entry>
</feed>

我的问题是尝试将“summary type=xhtml”部分转换为可读格式(如下所示),而不是 运行-on 长句。

CATEGORY: Information
BULLETIN ISSUE TIME: 
PRELIMINARY MAGNITUDE:

有人可以就如何解析 XSLT 中的信息向我提供一些建议吗?

提前谢谢你。

据我所知,Atom summary 的内容没有标准格式。如果您的数据提供者遵循示例中显示的格式,那么 - 给定 well-formed XML 输入,例如:

XML

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
xmlns:georss="http://www.georss.org/georss">
<id>urn:uuid:9ae4ae29-830f-4870-bace-0f70984b76bd</id><title>        
TSUNAMI INFORMATION STATEMENT NUMBER   1        </title>
<updated>2022-01-29T03:00:32Z</updated>
<author>
  <name>NWS PACIFIC TSUNAMI WARNING CENTER HONOLULU HI</name>
 <uri>http://ntwc.arh.noaa.gov/</uri>
 <email>ntwc@noaa.gov</email>
 </author>
 <icon>http://ntwc.arh.noaa.gov/images/favicon.ico</icon>
 <link type="application/atom+xml" rel="self" title="self" 
 href="http://ntwc.arh.noaa.gov/events/xml/PAAQAtom.xml"/>
 <entry>
 <title>KERMADEC ISLANDS REGION</title><updated>2022-01-29T03:00:32Z</updated>
 <geo:lat>-29.751</geo:lat>
 <geo:long>-174.709</geo:long>
 <summary type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
    <strong>Category:</strong> Information<br/>
    <strong>Bulletin Issue Time: </strong> 2022.01.29 03:00:32 UTC 
    <br/><strong>Preliminary Magnitude: </strong>6.6(Mwp)<br/> 
    <strong>Lat/Lon: </strong>-29.751 / -174.709<br/>
    <strong>Affected Region: </strong>KERMADEC ISLANDS REGION<br/>
</div>
</summary>
</entry>
</feed>

你可以这样做:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:a="http://www.w3.org/2005/Atom"
xmlns:x="http://www.w3.org/1999/xhtml">
<xsl:output method="text" encoding="UTF-8" />

<xsl:template match="/a:feed">
    <xsl:for-each select="a:entry/a:summary/x:div/x:strong">
        <xsl:value-of select="." />
        <xsl:value-of select="normalize-space(following-sibling::text()[1])" />
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

获得:

结果

Category:Information
Bulletin Issue Time: 2022.01.29 03:00:32 UTC
Preliminary Magnitude: 6.6(Mwp)
Lat/Lon: -29.751 / -174.709
Affected Region: KERMADEC ISLANDS REGION