Rails 规范化 csv 文件数据

Rails normalize csv file data

我正在尝试将一个 tsv(制表符分隔数据)文件导入我的数据库,只是它的格式不正确。 pricecount 列仅由 space 分隔(header 行除外)并且值都放在 price 键中,将所有数据移动到错误的键值对中。

tsv 文件:

purchaser name  item description    price   count   merchant address    merchant name
Alice Bob    off  of food 10.0 2   987 Fake St     Bob's Pizza
Example Name     of awesome for   10.0 5   456 Unreal Rd   Tom's Awesome Shop
Name Three   Sneakers for  5.0    1     123 Fake St     Sneaker Store Emporium
John Williams    Sneakers for  5.0    4     123 Fake St     Sneaker Store Emporium 

/models/purchase.rb中:

class Purchase < ActiveRecord::Base
  # validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }

  def self.import(file)
    CSV.foreach(file.path, :headers => true,
                       :header_converters => lambda { |h| h.downcase.gsub(' ', '_')},
                       :col_sep => "\t"
                       ) do |row|
                      # debugger
                      purchase_hash = row.to_hash
      Purchase.create!(purchase_hash)
    end
  end
end

如果我在模型的调试器中导入文件和注释,然后键入 row 它 returns:

#<CSV::Row "purchaser_name":"Alice Bob" "item_description":" off of food" "price":"10.0 2" "count":" 987 Fake St" "merchant_address":" Bob's Pizza" "merchant_name":nil>

row.inspect returns:

"#<CSV::Row \"purchaser_name\":\"Alice Bob\" \"item_description\":\" off of food\" \"price\":\"10.0 2\" \"count\":\" 987 Fake St\" \"merchant_address\":\" Bob's Pizza\" \"merchant_name\":nil>"

如您所见,price (10.0) 和 count (2) 已被压缩为相同的值,因为它们在文件中未被制表符分隔。

db/schema.rb:

ActiveRecord::Schema.define(version: 20160601205154) do

  create_table "purchases", force: :cascade do |t|
    t.string   "purchaser_name"
    t.string   "item_description"
    t.string   "price"
    t.string   "count"
    t.string   "merchant_address"
    t.string   "merchant_name"
    t.datetime "created_at",       null: false
    t.datetime "updated_at",       null: false
  end

end

我最初将 price 作为 Decimal 数据类型,将 count 作为 Integer 数据类型,但将它们切换回 String 以尝试找到解决方案。如果有帮助,我可以改回来(如果可能的话,我更愿意改回来)

您可以尝试移动 merchant_address 和 merchant_name 值,然后将压扁的价格和计数文件分开 space 并将这两个值分配给价格和计数:

purchase_hash = row.to_hash
purchase_hash[:merchant_name] = purchase_hash[:merchant_address]
purchase_hash[:merchant_address] = purchase_hash[:count]
splitted_price_count = purchase_hash[:price].split(" ")
purchase_hash[:price] = splitted_price_count.first
purchase_hash[:count] = splitted_price_count.last
Purchase.create!(purchase_hash)

解决这个问题的方法有两个。首先,定义一个转换器,在解析时将字段拆分为两部分(并在此过程中将其转换为数字):

CONVERTER_SPLIT_PRICE_COUNT = lambda do |value, info|
  next value unless info.header == "price"
  price, count = value.split
  [ price.to_f, count.to_i ]
end

这会将 price 字段变成一个数组,例如"10.0 2" 变为 [10.0, 2].

其次,定义一个方法,在解析后,将修复错位的值和return一个正确的哈希值:

def row_to_hash_fixing_price_count(row)
  row.headers.zip(row.fields.flatten).to_h
end

以上将 price/count 数组展平为 parent 数组(行的其余部分),然后将其与 headers 数组压缩。由于现在的字段比 headers 多,因此末尾的额外 nil 将被删除。

您将像这样使用它们:

csv_opts = {
  headers: true,
  col_sep: "\t",
  header_converters: ->(h) { h.downcase.tr(" ", "_") },
  converters: CONVERTER_SPLIT_PRICE_COUNT
}

data_out = CSV.new(data, csv_opts).map do |row|
  row_to_hash_fixing_price_count(row)
end
# => [ { "purchaser_name" => "Alice Bob",
#        "item_description" => " off  of food",
#        "price" => 10.0,
#        "count" => 2,
#        "merchant_address" => "987 Fake St",
#        "merchant_name" => "Bob's Pizza"
#      },
#      # ...
#    ]

您可以在此处查看实际效果:http://ideone.com/08wTPT

P.S。考虑批量创建记录而不是一次创建一个记录。鉴于上述情况,您可以 Purchase.create!(data_out) 因为 create! accepts an array of hashes.