带有 TYPO3 的 solr 索引所有类型的记录但不索引页面

Question

pages 记录的索引方式与其他记录不同。它们代表网站的单个页面，这些页面是根据其他记录构建的。所以这些页面被索引访问前端。时不时会有前端无法被索引的情况。 pages记录可以添加到索引队列，但所有索引调用都会导致错误。

索引页面需要什么？

当然你需要一个到 solr 服务器的连接和一个基本配置来激活 solr 索引器，但是如果你可以索引其他记录，比如新闻.

您需要一些拼写错误配置，如果您包含扩展中的静态模板，则应该存在这些配置。:

plugin.tx_solr {
    index {
        queue {
            pages = 1
            pages {
                initialization = ApacheSolrForTypo3\Solr\IndexQueue\Initializer\Page

                // allowed page types (doktype) when indexing records from table "pages"
                allowedPageTypes = 1,7,4

                indexingPriority = 0

                indexer = ApacheSolrForTypo3\Solr\IndexQueue\PageIndexer
                indexer {
                    // add options for the indexer here
                }

                // Only index standard pages and mount points that are not overlayed.
                additionalWhereClause = (doktype = 1 OR doktype=4 OR (doktype=7 AND mount_pid_ol=0)) AND no_search = 0

                //exclude some html parts inside TYPO3SEARCH markers by classname (comma list)
                excludeContentByClass = typo3-search-exclude

                fields {
                    sortSubTitle_stringS = subtitle
                }
            }
        }
    }
}

但仅此并不能获取索引中的页面内容。

Answer 1

还需要配置什么？

前端必须可用。
某些服务器配置不允许访问自己的页面。确保页面可以调用。
如果原始域无法访问，您可以配置一个帮助域，solr 可以在其中访问页面。确保在索引条目的 url 中存储正确的域。

页面需要适当的标记来标记相关内容，这样菜单就不会用不相关的页面向索引发送垃圾邮件：
 和 
如果没有这些可能出现多次的标记，则会计算整个文档。

但是还有一些停止索引的选项：
如问题中所示，doctype 也被视为可见性。
pages有一个选项 Include in Search [no_search] ，它显示给外部搜索引擎，但也从 solr 评估。

最后有一个选项，solr 从 indexed_search 中采用了该选项，但仅用于页面索引：config.index_enable = 1
如果没有此选项，您可以索引记录，但如果所有页面是索引队列中的进程，则所有页面都会抛出错误。

带有 TYPO3 的 solr 索引所有类型的记录但不索引页面

solr with TYPO3 indexes all kind of records but does not index pages

indexing

solr

typo3