Hugo 的相关内容算法是如何工作的?有哪些因素?

How does Hugo's Related Content algorithm work? What are the factors?

在他们的网站上他们说:

Hugo uses a set of factors to identify a page’s related content based on Front Matter parameters. This can be tuned to the desired set of indices and parameters or left to Hugo’s default Related Content configuration.

Source

但是算法究竟是如何工作的呢?有哪些因素?

原始方法在gohugoio/hugo PR 3815

中有解释

Several attempts have been started to fix #98 -- all of them have failed for some reason.
It is a hard problem to solve, and I think the main reason for failure has been the bottom-up-approach, i.e. we have started with the hardest problem: Solving Sherlock's last case.

The reason I'm picking up this ball again now is this Twitter thread:

Using intersect and keywords in page params work reasonably well, but it is quadratic and will be slow to unusable for larger sites.

So, instead of solving the hardest problem, I have started on this PR by outlining an interface:

type PageSearcher interface {
  Search(args ...interface{}) (Pages, error)
  SearchIndex(index string, args ...interface{}) (Pages, error)
  Similar(p *Page) (Pages, error)
  SimilarIndex(index string, p *Page) (Pages, error)
}

Naming suggestions welcomed.

The idea is that a user defines a set of indexes in config.toml:

indexes:
 - param: keywords
   weight: 1
- param: tags
   weight: 3

Then we lazily build some sort of index from that, and then you can do fast searches like:

{{ .Site.RegularPages.Similar . }}
{{ .Site.RegularPages.Search "hugo" }}
{{ .Site.RegularPages.SearchIndex "keywords" "hugo" | limit 10 }}

初步实施:gohugoio/hugo commit 3b4f17b