Sphider 搜索引擎出现 "URLs must include" 问题

Question

我正在使用 Sphider。

我想让 Spider 离开我的域 http://www.example.com but only crawl/visit urls containing example. Means only urls like http://www.example.com or http://www.my-example.com or http://www.test.example.com should get visited/indexed but NOT http://www.exa-mple.com。

看完manual I tried the following: Screenshot of what I tried.

但我在尝试编制索引时收到此消息：Image: What I'm getting when trying to index.

谁能帮帮我。我究竟做错了什么？我也已经尝试过 *example* 但这也没有用。

Answer 1

documentation 包含误导性示例：

Every string starting with a '*' in front is considered as a regular expression, so that '*/[a]+/' denotes a string with one or more a's in it.

[...] 是一个 字符 class，匹配其中定义的 set/range 中的任何单个字符。

您可以使用 */example/ 定义匹配 example 字符串的正则表达式。但是，如果您对检查上下文不感兴趣，也可以在必须包含列表中使用 example 字符串。

Answer 2

^(?=.*example)https?:\/\/\S+$

你可以试试这个。此处演示测试 https://regex101.com/r/LUkHsD/3

Trouble with "URLs must include" with Sphider Search Engine