检索 Metatags 的问题 - Nutch 2.3 版本
Issue in retrieving Metatags - Nutch 2.3 version
我使用的是Nutch2.3-src版本。我能够抓取网页,但它只获取描述,而不是其他元标记,如 LastModified、Author。
我更新了 Index.metadata 和 metatags.names 属性。但仍然没有运气。仅获取 null 作为值。
<property>
<name>metatags.names</name>
<value>*</value>
<description>Names of the metatags to extract, separated by ','.
Use '*' to extract all metatags. Prefixes the names with 'meta_' in
the parse-metadata. For instance, to index description and keywords,
you need to activate the plugins parse-metadata and index-metadata
and set the value of the properties 'metatags.names' and
'index.metadata' to 'description,keywords'.
</description>
</property>
<property>
<name>index.metadata</name>
<value>description,LastModified,Created,WCMCategories,WCMKeywords,Authors,SiteName,title,lastmodified,created,wcmcategories,wcmkeywords,authors,sitename,meta_description,meta_LastModified,meta_Created,meta_WCMCategories,meta_WCMKeywords,meta_Authors,meta_SiteName,meta_title,meta_lastmodified,meta_created,meta_wcmcategories,meta_wcmkeywords,meta_authors,meta_sitename</value>
<description>
Comma-separated list of keys to be taken from the metadata to generate fields.
Can be used e.g. for 'description' or 'keywords' provided that these values are generated
by a parser (see parse-metatags plugin), and property 'metatags.names'.
</description>
</property>
已解决此问题。元标签区分大小写。属性名称在网页和 nutch 中都应该匹配-site.xml.
我使用的是Nutch2.3-src版本。我能够抓取网页,但它只获取描述,而不是其他元标记,如 LastModified、Author。
我更新了 Index.metadata 和 metatags.names 属性。但仍然没有运气。仅获取 null 作为值。
<property>
<name>metatags.names</name>
<value>*</value>
<description>Names of the metatags to extract, separated by ','.
Use '*' to extract all metatags. Prefixes the names with 'meta_' in
the parse-metadata. For instance, to index description and keywords,
you need to activate the plugins parse-metadata and index-metadata
and set the value of the properties 'metatags.names' and
'index.metadata' to 'description,keywords'.
</description>
</property>
<property>
<name>index.metadata</name>
<value>description,LastModified,Created,WCMCategories,WCMKeywords,Authors,SiteName,title,lastmodified,created,wcmcategories,wcmkeywords,authors,sitename,meta_description,meta_LastModified,meta_Created,meta_WCMCategories,meta_WCMKeywords,meta_Authors,meta_SiteName,meta_title,meta_lastmodified,meta_created,meta_wcmcategories,meta_wcmkeywords,meta_authors,meta_sitename</value>
<description>
Comma-separated list of keys to be taken from the metadata to generate fields.
Can be used e.g. for 'description' or 'keywords' provided that these values are generated
by a parser (see parse-metatags plugin), and property 'metatags.names'.
</description>
</property>
已解决此问题。元标签区分大小写。属性名称在网页和 nutch 中都应该匹配-site.xml.