将产品和类别的平面列表转换为树结构
Convert flat list of products and categories to tree structure
我目前有以下结构的项目:
[{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000'
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
这个数据是我编的,但是我用的数据是同一个结构,我不能改变进来的数据。
我有超过 100 000 个。
每个产品嵌套在 1 到 n(无限制)类别之间。
由于此数据的表格性质,类别重复。我想使用树数据来防止这种重复并将文件大小减少 25 到 30%。
我的目标是像这样的树结构:
{
"type" => "category",
"properties" => {
"name" => "Alcoholic Beverages"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Wine"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Red Wine"
},
"children" => [{
"type" => "product",
"properties" => {
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}
}]
}]
}]
}
我似乎想不出一个有效的算法来解决这个问题。我将不胜感激在正确方向上的任何帮助。
我应该为每个节点生成 ID 并添加父 ID 吗?我担心使用 ID 会增加文本的长度,我正试图缩短它。
尽管我已根据您要求的结构对其进行了一些简化,但您可以使用逻辑来了解如何完成它:
require 'pp'
x = [{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
result = {}
x.each do |entry|
# Save current level in a variable
current_level = result
# We want some special logic for the last item, so let's store that.
item = entry['category'].pop
# For each category, check if it exists, else add a category hash.
entry['category'].each do |category|
unless current_level.has_key?(category)
current_level[category] = {'type' => 'category', 'children' => {}, 'name' => category}
end
current_level = current_level[category]['children'] # Set the new current level of the hash.
end
# Finally add the item:
entry.delete('category')
entry['type'] = 'product'
current_level[item] = entry
end
pp result
它给了我们:
{"Alcoholic Beverages"=>
{"type"=>"category",
"children"=>
{"Wine"=>
{"type"=>"category",
"children"=>
{:"Red Wine"=>
{"name"=>"Robertson Merlot",
"barcode"=>"123456789-000",
"wine_farm"=>"Robertson Wineries",
"price"=>60.0,
"type"=>"product"}},
"name"=>"Wine"}},
"name"=>"Alcoholic Beverages"}}
可能有更简单的方法来做到这一点,但这是我现在能想到的,它应该符合您的结构。
require 'json'
# Initial set up, it seems the root keys are always the same looking at your structure
products = {
'type' => 'category',
'properties' => {
'name' => 'Alcoholic Beverages'
},
'children' => []
}
data = [{
"category" => ['Alcoholic Beverages', 'Wine', 'Red Wine'],
"name" => 'Robertson Merlot',
"barcode" => '123456789-000',
"wine_farm" => 'Robertson Wineries',
"price" => 60.00
}]
data.each do |item|
# Make sure we set the current to the top-level again
curr = products['children']
# Remove first entry as it's always 'Alcoholic Beverages'
item['category'].shift
item['category'].each do |category|
# Get the index for the category if it exists
index = curr.index {|x| x['type'] == 'category' && x['properties']['name'] == category}
# If it exists then change current hash level to the child of that category
if index
curr = curr[index]['children']
# Else add it in
else
curr << {
'type' => 'category',
'properties' => {
'name' => category
},
'children' => []
}
# We can use last as we know it'll be the last index.
curr = curr.last['children']
end
end
# Delete category from the item itself
item.delete('category')
# Add the item as product type to the last level of the hash
curr << {
'type' => 'product',
'properties' => item
}
end
puts JSON.pretty_generate(products)
我目前有以下结构的项目:
[{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000'
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
这个数据是我编的,但是我用的数据是同一个结构,我不能改变进来的数据。
我有超过 100 000 个。
每个产品嵌套在 1 到 n(无限制)类别之间。
由于此数据的表格性质,类别重复。我想使用树数据来防止这种重复并将文件大小减少 25 到 30%。
我的目标是像这样的树结构:
{
"type" => "category",
"properties" => {
"name" => "Alcoholic Beverages"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Wine"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Red Wine"
},
"children" => [{
"type" => "product",
"properties" => {
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}
}]
}]
}]
}
我似乎想不出一个有效的算法来解决这个问题。我将不胜感激在正确方向上的任何帮助。
我应该为每个节点生成 ID 并添加父 ID 吗?我担心使用 ID 会增加文本的长度,我正试图缩短它。
尽管我已根据您要求的结构对其进行了一些简化,但您可以使用逻辑来了解如何完成它:
require 'pp'
x = [{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
result = {}
x.each do |entry|
# Save current level in a variable
current_level = result
# We want some special logic for the last item, so let's store that.
item = entry['category'].pop
# For each category, check if it exists, else add a category hash.
entry['category'].each do |category|
unless current_level.has_key?(category)
current_level[category] = {'type' => 'category', 'children' => {}, 'name' => category}
end
current_level = current_level[category]['children'] # Set the new current level of the hash.
end
# Finally add the item:
entry.delete('category')
entry['type'] = 'product'
current_level[item] = entry
end
pp result
它给了我们:
{"Alcoholic Beverages"=>
{"type"=>"category",
"children"=>
{"Wine"=>
{"type"=>"category",
"children"=>
{:"Red Wine"=>
{"name"=>"Robertson Merlot",
"barcode"=>"123456789-000",
"wine_farm"=>"Robertson Wineries",
"price"=>60.0,
"type"=>"product"}},
"name"=>"Wine"}},
"name"=>"Alcoholic Beverages"}}
可能有更简单的方法来做到这一点,但这是我现在能想到的,它应该符合您的结构。
require 'json'
# Initial set up, it seems the root keys are always the same looking at your structure
products = {
'type' => 'category',
'properties' => {
'name' => 'Alcoholic Beverages'
},
'children' => []
}
data = [{
"category" => ['Alcoholic Beverages', 'Wine', 'Red Wine'],
"name" => 'Robertson Merlot',
"barcode" => '123456789-000',
"wine_farm" => 'Robertson Wineries',
"price" => 60.00
}]
data.each do |item|
# Make sure we set the current to the top-level again
curr = products['children']
# Remove first entry as it's always 'Alcoholic Beverages'
item['category'].shift
item['category'].each do |category|
# Get the index for the category if it exists
index = curr.index {|x| x['type'] == 'category' && x['properties']['name'] == category}
# If it exists then change current hash level to the child of that category
if index
curr = curr[index]['children']
# Else add it in
else
curr << {
'type' => 'category',
'properties' => {
'name' => category
},
'children' => []
}
# We can use last as we know it'll be the last index.
curr = curr.last['children']
end
end
# Delete category from the item itself
item.delete('category')
# Add the item as product type to the last level of the hash
curr << {
'type' => 'product',
'properties' => item
}
end
puts JSON.pretty_generate(products)