关于全文搜索或现有搜索算法的建议

Question

有人可以建议如何轻松解决以下搜索问题，我的意思是有没有算法，或者全文搜索就足够了？

物品数据有以下分类，

ItemCategory	ItemCluster	ItemSubCluster	SubCluster	Items
Vegetable	Root vegetables	Root	WithOutSkin	potato, sweet potato, yam
Vegetable	Root vegetables	Root	WithSkin	onion, garlic, shallot
Vegetable	Greens	Leafy green	Leaf	lettuce, spinach, silverbeet
Vegetable	Greens	Cruciferous	Flower	cabbage, cauliflower, Brussels sprouts, broccoli
Vegetable	Greens	Edible plant stem	Stem	celery, asparagus

输入类似于，

红薯、山药
山药、土豆
大蒜、洋葱
生菜、菠菜、银甜菜
生菜、银甜菜
生菜、银甜菜、菠菜

从输入中，我想获取属于 ItemCategory、ItemCluster、ItemSubCluster、SubCluster 的输入项的映射。

任何帮助将不胜感激。

Answer 1

您几乎遵循了正确的方法。

此处不需要全文搜索。

这里可以创建的是一种倒排索引，如下：

如果我们以 potato 为例，为 potato 创建一个映射，存储它的 ItemCategory、ItemCluster、ItemSubCluster、SubCluster。

例如-

"potato": {
    "ItemCategory": "Vegetable",
    "ItemCluster": "Root vegetables",
    "ItemSubcluster": "Root",
    "Subcluster": "Without Skin"
}

现在，为每种蔬菜存储这种数据会很昂贵。

您可以使用编码方案优化存储：

例如-

让 ItemCategory 表示为 0，让 ItemCluster 表示为 1，让 ItemSubcluster 表示为 2，让 Subcluster 表示为 3

并且这些值由类似的编码方案表示：

让 Vegetable 表示为 0，让 Root vegetables 表示为 1，让 Root 表示为 2，让 Without Skin 表示为 3

现在，您的映射变为：

"potato": {
    "0": "0",
    "1": "1",
    "2": "2",
    "3": "3",
}

为了进一步优化这一点，您还可以维护蔬菜索引。例如，potato 可以表示为 0。

因此您的最终索引变为：

"0": {
    "0": "0",
    "1": "1",
    "2": "2",
    "3": "3",
}

suggestions on fulltext search or already existing search algorithms