如何在不影响性能的情况下高效地在Solr中存储重复数据

Question

我在 Solr 中存储的数据结构有点像这样。

[{
    "Product": "Boomerang"
    "Price": 42,
    "Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
    "Product": "Juggling Chainsaws"
    "Price": 94,
    "Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More","The Outdoor Shop"]
},
{
    "Product": "Chainsaw"
    "Price": 5,
    "Stores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"],
}]

"Stores" 字段中有数千种具有相同值的不同产品。

有没有一种方法可以消除重复存储这些相同值的需要，而不影响查询的搜索性能，例如：'Find a chainsaw from Labor Store'

这就是我的想法：

[{
    "Product": "Boomerang"
    "Price": 42,
    "StoreGroup": "NoveltySportsStores",
},
{
    "Product": "Juggling Chainsaws"
    "Price": 94,
    "StoreGroup": "NoveltySportsStores",
},
{
    "Product": "Chainsaw"
    "Price": 5,
    "StoreGroup": "OutdoorsStores"
},
{
    "NoveltySportsStores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
    "OutdoorsStores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"]
}]

编辑：这个例子是完全编造的。对于我的真实用例，组将保持不变，每个组重复约 5000 次，总共约 50000 个组。

Answer 1

您将 Solr/Lucene 视为 RDBMS，但事实并非如此。即使它在您看来重复太多和资源损失，但事实并非如此。第一种方法是索引数据的自然且最好的方法。

您也可以将其用作第二种方式，但第一种方式更好，也更简单。

如何在不影响性能的情况下高效地在Solr中存储重复数据

How to efficiently store repetitive data in Solr without affecting performance

lucene

solr

full-text-search

search-engine