Hugo 的相关内容算法是如何工作的?有哪些因素?
How does Hugo's Related Content algorithm work? What are the factors?
在他们的网站上他们说:
Hugo uses a set of factors to identify a page’s related content based on Front Matter parameters. This can be tuned to the desired set of indices and parameters or left to Hugo’s default Related Content configuration.
但是算法究竟是如何工作的呢?有哪些因素?
中有解释
Several attempts have been started to fix #98 -- all of them have failed for some reason.
It is a hard problem to solve, and I think the main reason for failure has been the bottom-up-approach, i.e. we have started with the hardest problem: Solving Sherlock's last case.
The reason I'm picking up this ball again now is this Twitter thread:
Using intersect and keywords in page params work reasonably well, but it is quadratic and will be slow to unusable for larger sites.
So, instead of solving the hardest problem, I have started on this PR by outlining an interface:
type PageSearcher interface {
Search(args ...interface{}) (Pages, error)
SearchIndex(index string, args ...interface{}) (Pages, error)
Similar(p *Page) (Pages, error)
SimilarIndex(index string, p *Page) (Pages, error)
}
Naming suggestions welcomed.
The idea is that a user defines a set of indexes in config.toml:
indexes:
- param: keywords
weight: 1
- param: tags
weight: 3
Then we lazily build some sort of index from that, and then you can do fast searches like:
{{ .Site.RegularPages.Similar . }}
{{ .Site.RegularPages.Search "hugo" }}
{{ .Site.RegularPages.SearchIndex "keywords" "hugo" | limit 10 }}
在他们的网站上他们说:
Hugo uses a set of factors to identify a page’s related content based on Front Matter parameters. This can be tuned to the desired set of indices and parameters or left to Hugo’s default Related Content configuration.
但是算法究竟是如何工作的呢?有哪些因素?
Several attempts have been started to fix #98 -- all of them have failed for some reason.
It is a hard problem to solve, and I think the main reason for failure has been the bottom-up-approach, i.e. we have started with the hardest problem: Solving Sherlock's last case.The reason I'm picking up this ball again now is this Twitter thread:
Using intersect and keywords in page params work reasonably well, but it is quadratic and will be slow to unusable for larger sites.
So, instead of solving the hardest problem, I have started on this PR by outlining an interface:
type PageSearcher interface { Search(args ...interface{}) (Pages, error) SearchIndex(index string, args ...interface{}) (Pages, error) Similar(p *Page) (Pages, error) SimilarIndex(index string, p *Page) (Pages, error) }
Naming suggestions welcomed.
The idea is that a user defines a set of indexes in config.toml:
indexes: - param: keywords weight: 1 - param: tags weight: 3
Then we lazily build some sort of index from that, and then you can do fast searches like:
{{ .Site.RegularPages.Similar . }} {{ .Site.RegularPages.Search "hugo" }} {{ .Site.RegularPages.SearchIndex "keywords" "hugo" | limit 10 }}