Rails 规范化 csv 文件数据
Rails normalize csv file data
我正在尝试将一个 tsv(制表符分隔数据)文件导入我的数据库,只是它的格式不正确。 price
和 count
列仅由 space 分隔(header 行除外)并且值都放在 price
键中,将所有数据移动到错误的键值对中。
tsv 文件:
purchaser name item description price count merchant address merchant name
Alice Bob off of food 10.0 2 987 Fake St Bob's Pizza
Example Name of awesome for 10.0 5 456 Unreal Rd Tom's Awesome Shop
Name Three Sneakers for 5.0 1 123 Fake St Sneaker Store Emporium
John Williams Sneakers for 5.0 4 123 Fake St Sneaker Store Emporium
在/models/purchase.rb
中:
class Purchase < ActiveRecord::Base
# validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }
def self.import(file)
CSV.foreach(file.path, :headers => true,
:header_converters => lambda { |h| h.downcase.gsub(' ', '_')},
:col_sep => "\t"
) do |row|
# debugger
purchase_hash = row.to_hash
Purchase.create!(purchase_hash)
end
end
end
如果我在模型的调试器中导入文件和注释,然后键入 row
它 returns:
#<CSV::Row "purchaser_name":"Alice Bob" "item_description":" off of food" "price":"10.0 2" "count":" 987 Fake St" "merchant_address":" Bob's Pizza" "merchant_name":nil>
row.inspect
returns:
"#<CSV::Row \"purchaser_name\":\"Alice Bob\" \"item_description\":\" off of food\" \"price\":\"10.0 2\" \"count\":\" 987 Fake St\" \"merchant_address\":\" Bob's Pizza\" \"merchant_name\":nil>"
如您所见,price
(10.0) 和 count
(2) 已被压缩为相同的值,因为它们在文件中未被制表符分隔。
db/schema.rb
:
ActiveRecord::Schema.define(version: 20160601205154) do
create_table "purchases", force: :cascade do |t|
t.string "purchaser_name"
t.string "item_description"
t.string "price"
t.string "count"
t.string "merchant_address"
t.string "merchant_name"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
end
我最初将 price
作为 Decimal 数据类型,将 count
作为 Integer 数据类型,但将它们切换回 String 以尝试找到解决方案。如果有帮助,我可以改回来(如果可能的话,我更愿意改回来)
您可以尝试移动 merchant_address 和 merchant_name 值,然后将压扁的价格和计数文件分开 space 并将这两个值分配给价格和计数:
purchase_hash = row.to_hash
purchase_hash[:merchant_name] = purchase_hash[:merchant_address]
purchase_hash[:merchant_address] = purchase_hash[:count]
splitted_price_count = purchase_hash[:price].split(" ")
purchase_hash[:price] = splitted_price_count.first
purchase_hash[:count] = splitted_price_count.last
Purchase.create!(purchase_hash)
解决这个问题的方法有两个。首先,定义一个转换器,在解析时将字段拆分为两部分(并在此过程中将其转换为数字):
CONVERTER_SPLIT_PRICE_COUNT = lambda do |value, info|
next value unless info.header == "price"
price, count = value.split
[ price.to_f, count.to_i ]
end
这会将 price
字段变成一个数组,例如"10.0 2"
变为 [10.0, 2]
.
其次,定义一个方法,在解析后,将修复错位的值和return一个正确的哈希值:
def row_to_hash_fixing_price_count(row)
row.headers.zip(row.fields.flatten).to_h
end
以上将 price/count 数组展平为 parent 数组(行的其余部分),然后将其与 headers 数组压缩。由于现在的字段比 headers 多,因此末尾的额外 nil
将被删除。
您将像这样使用它们:
csv_opts = {
headers: true,
col_sep: "\t",
header_converters: ->(h) { h.downcase.tr(" ", "_") },
converters: CONVERTER_SPLIT_PRICE_COUNT
}
data_out = CSV.new(data, csv_opts).map do |row|
row_to_hash_fixing_price_count(row)
end
# => [ { "purchaser_name" => "Alice Bob",
# "item_description" => " off of food",
# "price" => 10.0,
# "count" => 2,
# "merchant_address" => "987 Fake St",
# "merchant_name" => "Bob's Pizza"
# },
# # ...
# ]
您可以在此处查看实际效果:http://ideone.com/08wTPT
P.S。考虑批量创建记录而不是一次创建一个记录。鉴于上述情况,您可以 Purchase.create!(data_out)
因为 create!
accepts an array of hashes.
我正在尝试将一个 tsv(制表符分隔数据)文件导入我的数据库,只是它的格式不正确。 price
和 count
列仅由 space 分隔(header 行除外)并且值都放在 price
键中,将所有数据移动到错误的键值对中。
tsv 文件:
purchaser name item description price count merchant address merchant name
Alice Bob off of food 10.0 2 987 Fake St Bob's Pizza
Example Name of awesome for 10.0 5 456 Unreal Rd Tom's Awesome Shop
Name Three Sneakers for 5.0 1 123 Fake St Sneaker Store Emporium
John Williams Sneakers for 5.0 4 123 Fake St Sneaker Store Emporium
在/models/purchase.rb
中:
class Purchase < ActiveRecord::Base
# validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }
def self.import(file)
CSV.foreach(file.path, :headers => true,
:header_converters => lambda { |h| h.downcase.gsub(' ', '_')},
:col_sep => "\t"
) do |row|
# debugger
purchase_hash = row.to_hash
Purchase.create!(purchase_hash)
end
end
end
如果我在模型的调试器中导入文件和注释,然后键入 row
它 returns:
#<CSV::Row "purchaser_name":"Alice Bob" "item_description":" off of food" "price":"10.0 2" "count":" 987 Fake St" "merchant_address":" Bob's Pizza" "merchant_name":nil>
row.inspect
returns:
"#<CSV::Row \"purchaser_name\":\"Alice Bob\" \"item_description\":\" off of food\" \"price\":\"10.0 2\" \"count\":\" 987 Fake St\" \"merchant_address\":\" Bob's Pizza\" \"merchant_name\":nil>"
如您所见,price
(10.0) 和 count
(2) 已被压缩为相同的值,因为它们在文件中未被制表符分隔。
db/schema.rb
:
ActiveRecord::Schema.define(version: 20160601205154) do
create_table "purchases", force: :cascade do |t|
t.string "purchaser_name"
t.string "item_description"
t.string "price"
t.string "count"
t.string "merchant_address"
t.string "merchant_name"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
end
我最初将 price
作为 Decimal 数据类型,将 count
作为 Integer 数据类型,但将它们切换回 String 以尝试找到解决方案。如果有帮助,我可以改回来(如果可能的话,我更愿意改回来)
您可以尝试移动 merchant_address 和 merchant_name 值,然后将压扁的价格和计数文件分开 space 并将这两个值分配给价格和计数:
purchase_hash = row.to_hash
purchase_hash[:merchant_name] = purchase_hash[:merchant_address]
purchase_hash[:merchant_address] = purchase_hash[:count]
splitted_price_count = purchase_hash[:price].split(" ")
purchase_hash[:price] = splitted_price_count.first
purchase_hash[:count] = splitted_price_count.last
Purchase.create!(purchase_hash)
解决这个问题的方法有两个。首先,定义一个转换器,在解析时将字段拆分为两部分(并在此过程中将其转换为数字):
CONVERTER_SPLIT_PRICE_COUNT = lambda do |value, info|
next value unless info.header == "price"
price, count = value.split
[ price.to_f, count.to_i ]
end
这会将 price
字段变成一个数组,例如"10.0 2"
变为 [10.0, 2]
.
其次,定义一个方法,在解析后,将修复错位的值和return一个正确的哈希值:
def row_to_hash_fixing_price_count(row)
row.headers.zip(row.fields.flatten).to_h
end
以上将 price/count 数组展平为 parent 数组(行的其余部分),然后将其与 headers 数组压缩。由于现在的字段比 headers 多,因此末尾的额外 nil
将被删除。
您将像这样使用它们:
csv_opts = {
headers: true,
col_sep: "\t",
header_converters: ->(h) { h.downcase.tr(" ", "_") },
converters: CONVERTER_SPLIT_PRICE_COUNT
}
data_out = CSV.new(data, csv_opts).map do |row|
row_to_hash_fixing_price_count(row)
end
# => [ { "purchaser_name" => "Alice Bob",
# "item_description" => " off of food",
# "price" => 10.0,
# "count" => 2,
# "merchant_address" => "987 Fake St",
# "merchant_name" => "Bob's Pizza"
# },
# # ...
# ]
您可以在此处查看实际效果:http://ideone.com/08wTPT
P.S。考虑批量创建记录而不是一次创建一个记录。鉴于上述情况,您可以 Purchase.create!(data_out)
因为 create!
accepts an array of hashes.