将产品和类别的平面列表转换为树结构

Convert flat list of products and categories to tree structure

我目前有以下结构的项目:

[{
    "category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
    "name" => "Robertson Merlot",
    "barcode" => '123456789-000'
    "wine_farm" => "Robertson Wineries",
    "price" => 60.00
}]

这个数据是我编的,但是我用的数据是同一个结构,我不能改变进来的数据。

我有超过 100 000 个。

每个产品嵌套在 1 到 n(无限制)类别之间。

由于此数据的表格性质,类别重复。我想使用树数据来防止这种重复并将文件大小减少 25 到 30%。

我的目标是像这样的树结构:

{
    "type" => "category",
    "properties" => {
        "name" => "Alcoholic Beverages"
    },
    "children" => [{
                       "type" => "category",
                       "properties" => {
                           "name" => "Wine"
                       },
                       "children" => [{
                                          "type" => "category",
                                          "properties" => {
                                              "name" => "Red Wine"
                                          },
                                          "children" => [{
                                                             "type" => "product",
                                                             "properties" => {
                                                                 "name" => "Robertson Merlot",
                                                                 "barcode" => '123456789-000',
                                                                 "wine_farm" => "Robertson Wineries",
                                                                 "price" => 60.00
                                                             }
                                                         }]

                                      }]
                   }]
}
  1. 我似乎想不出一个有效的算法来解决这个问题。我将不胜感激在正确方向上的任何帮助。

  2. 我应该为每个节点生成 ID 并添加父 ID 吗?我担心使用 ID 会增加文本的长度,我正试图缩短它。

尽管我已根据您要求的结构对其进行了一些简化,但您可以使用逻辑来了解如何完成它:

require 'pp'
x = [{
    "category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
    "name" => "Robertson Merlot",
    "barcode" => '123456789-000',
    "wine_farm" => "Robertson Wineries",
    "price" => 60.00
}]

result = {}

x.each do |entry|

  # Save current level in a variable
  current_level = result

  # We want some special logic for the last item, so let's store that.
  item = entry['category'].pop


  # For each category, check if it exists, else add a category hash.
  entry['category'].each do |category|
    unless current_level.has_key?(category)
      current_level[category] = {'type' => 'category', 'children' => {}, 'name' => category}
    end
    current_level = current_level[category]['children'] # Set the new current level of the hash.
  end

  # Finally add the item:
  entry.delete('category')
  entry['type'] = 'product'
  current_level[item] = entry

end

pp result

它给了我们:

{"Alcoholic Beverages"=>
  {"type"=>"category",
   "children"=>
    {"Wine"=>
      {"type"=>"category",
       "children"=>
        {:"Red Wine"=>
          {"name"=>"Robertson Merlot",
           "barcode"=>"123456789-000",
           "wine_farm"=>"Robertson Wineries",
           "price"=>60.0,
           "type"=>"product"}},
       "name"=>"Wine"}},
   "name"=>"Alcoholic Beverages"}}

可能有更简单的方法来做到这一点,但这是我现在能想到的,它应该符合您的结构。

require 'json'

# Initial set up, it seems the root keys are always the same looking at your structure
products = {
  'type' => 'category',
  'properties' => {
    'name' => 'Alcoholic Beverages'
  },
  'children' => []
}

data = [{
    "category" => ['Alcoholic Beverages', 'Wine', 'Red Wine'],
    "name" => 'Robertson Merlot',
    "barcode" => '123456789-000',
    "wine_farm" => 'Robertson Wineries',
    "price" => 60.00
}]

data.each do |item|
  # Make sure we set the current to the top-level again
  curr = products['children']

  # Remove first entry as it's always 'Alcoholic Beverages'
  item['category'].shift

  item['category'].each do |category|
    # Get the index for the category if it exists
    index = curr.index {|x| x['type'] == 'category' && x['properties']['name'] == category}

    # If it exists then change current hash level to the child of that category
    if index
      curr = curr[index]['children']

    # Else add it in
    else
      curr << {
        'type' => 'category', 
        'properties' => {
          'name' => category
        },
        'children' => []
      }

      # We can use last as we know it'll be the last index.
      curr = curr.last['children']
    end  
  end

  # Delete category from the item itself
  item.delete('category')

  # Add the item as product type to the last level of the hash
  curr << {
    'type' => 'product',
    'properties' => item
  }
end

puts JSON.pretty_generate(products)