如何在不影响性能的情况下高效地在Solr中存储重复数据
How to efficiently store repetitive data in Solr without affecting performance
我在 Solr 中存储的数据结构有点像这样。
[{
"Product": "Boomerang"
"Price": 42,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More","The Outdoor Shop"]
},
{
"Product": "Chainsaw"
"Price": 5,
"Stores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"],
}]
"Stores" 字段中有数千种具有相同值的不同产品。
有没有一种方法可以消除重复存储这些相同值的需要,而不影响查询的搜索性能,例如:'Find a chainsaw from Labor Store'
这就是我的想法:
[{
"Product": "Boomerang"
"Price": 42,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Chainsaw"
"Price": 5,
"StoreGroup": "OutdoorsStores"
},
{
"NoveltySportsStores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"OutdoorsStores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"]
}]
编辑:
这个例子是完全编造的。对于我的真实用例,组将保持不变,每个组重复约 5000 次,总共约 50000 个组。
您将 Solr/Lucene 视为 RDBMS,但事实并非如此。即使它在您看来重复太多和资源损失,但事实并非如此。第一种方法是索引数据的自然且最好的方法。
您也可以将其用作第二种方式,但第一种方式更好,也更简单。
我在 Solr 中存储的数据结构有点像这样。
[{
"Product": "Boomerang"
"Price": 42,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More","The Outdoor Shop"]
},
{
"Product": "Chainsaw"
"Price": 5,
"Stores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"],
}]
"Stores" 字段中有数千种具有相同值的不同产品。
有没有一种方法可以消除重复存储这些相同值的需要,而不影响查询的搜索性能,例如:'Find a chainsaw from Labor Store'
这就是我的想法:
[{
"Product": "Boomerang"
"Price": 42,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Chainsaw"
"Price": 5,
"StoreGroup": "OutdoorsStores"
},
{
"NoveltySportsStores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"OutdoorsStores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"]
}]
编辑: 这个例子是完全编造的。对于我的真实用例,组将保持不变,每个组重复约 5000 次,总共约 50000 个组。
您将 Solr/Lucene 视为 RDBMS,但事实并非如此。即使它在您看来重复太多和资源损失,但事实并非如此。第一种方法是索引数据的自然且最好的方法。
您也可以将其用作第二种方式,但第一种方式更好,也更简单。