通过两个不同的CSV文件将数据写入一个CSV文件

Question

所以，我正在学习 ruby，我已经坚持了很长时间，我需要一些帮助。

我需要从两个不同的 CSV 文件写入一个 CSV 文件，我有代码来执行此操作，但有 2 个不同的函数，我需要将两个文件合二为一。

这就是代码：

require 'CSV'

class Plantas <
   Struct.new( :code)
end

class Especies <
   Struct.new(:id, :type, :code, :name_es, :name_ca, :name_en, :latin_name, :customer_id )
end

def ecode

   f_inECODE = File.open("pflname.csv", "r")                  #get EPPOCODE
   f_out=CSV.open("plantas.csv", "w+", :headers => true) #outputfile

   f_inECODE.each_line do |line|

   fields = line.split(',')

   newPlant = Plantas.new

   newPlant.code = fields[2].tr_s('"', '').strip #eppocode

       plant = [newPlant.code] #linies a imprimir
       f_out <<  plant

   end
end

def data

   f_dataspices=File.open("spices.csv", "r")
   f_out=CSV.open("plantas.csv", "w+", :headers => true) #outputfile

   f_dataspices.each_line do |line|

       fields = line.split(',')
       newEspecies = Especies.new
       
       newEspecies.id = fields[0].tr_s('"', '').strip 
       newEspecies.type = fields[1].tr_s('"', '').strip 
       newEspecies.code = fields[2].tr_s('"', '').strip 
       newEspecies.name_es = fields[3].tr_s('"', '').strip 
       newEspecies.name_ca = fields[4].tr_s('"', '').strip 
       newEspecies.name_en = fields[5].tr_s('"', '').strip 
       newEspecies.latin_name = fields[6].tr_s('"', '').strip
       newEspecies.customer_id = fields[7].tr_s('"', '').strip 
       
           especia = [newEspecies.id,newEspecies.type,newEspecies.code,newEspecies.name_es,newEspecies.name_ca,newEspecies.name_en,newEspecies.latin_name,newEspecies.customer_id] 
           f_out <<  especia
   end
end

data 
ecode

希望的输出是这样的：species.csv + ecode.csv

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id","ecode"
7205,"DunSpecies",NULL,"0","0","0","",11630,LEECO
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273,LEE3O
7204,"DunSpecies",NULL,"0","0","0","",11630,L4ECO

实际情况是这样的：

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273
7204,"DunSpecies",NULL,"0","0","0","",11630

（无电子码）

一方面我有电子代码，另一方面我只需要将整个数据放在一起。

我想将所有内容放在同一个文件中 (plantas.csv) 我做了两个不同的功能，因为我不知道如何将所有功能与一个 foreach 放在一起我想将所有功能放在同一个功能中，但我不知道如何做。如果有人可以帮助我将这段代码全部放在一个函数中并将结果写入同一个文件中，我将不胜感激。

文件 ecode.csv 的输入示例（我只想要 ecode 字段）是这样的：

"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname"""
"""N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non"""
"""N2974"",""PFL"",""LEECO"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""

文件data.csv（我想要所有字段）的输入示例是这样的：

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273

link 这两个文件的方法是创建第三个文件，我在其中写入所有内容！至少这是我的想法，不知道有没有更简单的方法。

谢谢！

Answer 1

清理 ecode.csv 使它更具挑战性，但这是我想出的：

如果 data.csv 和 ecode.csv 按行号匹配：

require 'csv'

data = CSV.read('data.csv', headers: true).to_a
headers = data.shift << 'eppocode'

double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)

CSV.open('plantas.csv', 'w+') do |plantas|
  plantas << headers
  data.each.with_index do |row, idx|
    planta = row + [ecode['code'][idx]]
    plantas << planta
  end
end

使用您的示例文件，这将为您提供以下内容 plantas.csv：

id,type,code,name_es,name_ca,name_en,latin_name,customer_id,eppocode
7205,DunSpecies,NULL,0,0,0,"",11630,LEECO
7437,DunSpecies,NULL,0,Xicoira,0,"",5273,LEECO

如果条目与 data.csv 的 id 和 ecode.csv 的 identifier:

匹配

require 'csv'

data = CSV.read('data.csv', headers: true)
headers = data.headers << 'eppocode'

double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)

CSV.open('plantas.csv', 'w+') do |plantas|
  plantas << headers
  data.each do |row|
    id = row['id']
    ecode_row = ecode.find { |entry| entry['identifier'] == id } || {}
    planta = row << ecode_row['code']
    plantas << planta
  end
end

希望对您有所帮助。

Answer 2

数据

让我们从创建两个 CSV 文件开始。为了使结果更容易理解，我任意删除了每个文件中的一些字段，并更改了一个字段值。

ecode.csv

ecode = '"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname"""    """N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non"""    """N2974"",""PFL"",""LEEC1"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""'

File.write('ecode.csv', ecode)
  #=> 452

data.csv

data = '"id","type","code","customer_id"\n7205,"DunSpecies",NULL,11630\n7437,"DunSpecies",NULL,,5273'

File.write('data.csv', data)
  #=> 90

代码

CSV.open('plantas.csv', 'w') do |csv_out|
  converter = ->(s) { s.delete('"') }

  epposcode = CSV.foreach('ecode.csv',
    headers:true,
    header_converters: [converter],
    converters: [converter]
  ).map { |csv| csv["code"] }

  headers = CSV.open('data.csv', &:readline) << 'epposcode'
  csv_out << headers

  CSV.foreach('data.csv', headers:true) do |row|
    csv_out << (row << epposcode.shift)
  end
end
  #=> 90

结果

让我们看看写了什么。

puts File.read('plantas.csv')

id,type,code,customer_id,epposcode
7205,DunSpecies,NULL,11630,LEECO
7437,DunSpecies,NULL,,5273,LEEC1

说明

我们想要的结构如下

CSV.open('plantas.csv', 'w') do |csv_out|
  epposcode = <array of 'code' field values from 'ecode.csv'>
  headers = <headers from 'data.csv' to which 'epposcode' is appended>
  csv_out << headers
  CSV.foreach('data.csv', headers:true) do |row|
    csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
  end
end

CSV::open is the main CSV method for writing files and CSV::foreach 通常是我用来读取 CSV 文件的 method-of-choice。我本可以改为编写以下内容。

csv_out = CSV.open('plantas.csv', 'w')

epposcode = <array of 'code' field values from 'ecode.csv'>
headers = <headers from 'data.csv' to which 'epposcode' is appended>
csv_out << headers
CSV.foreach('data.csv', headers:true) do |row|
  csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
end

csv_out.close

但是使用块很方便，因为文件在从块中 returning 之前关闭。

对 header 字段和行字段都使用转换器很方便：

converter = ->(s) { s.delete('"') }

这是一个从字符串中删除双引号的过程（我定义了一个 lambda）。它们被指定为 foreach 的两个可选参数：

  epposcode = CSV.foreach('ecode.csv',
    headers:true,
    header_converters: [converter],
    converters: [converter]
  )

在 CSV 文档中搜索“数据转换器”。

我们调用 foreach 而不阻塞 return 枚举器，因此它可以链接到 map:

epposcode = CSV.foreach('ecode.csv',
  headers:true,
  header_converters: [converter],
  converters: [converter]
).map { |csv| csv["code"] }

例如，

epposcode
  #=> ["LEECO", "LEEC1"]

通过两个不同的CSV文件将数据写入一个CSV文件

Writing data into a CSV file by two different CSV files

ruby

csv

writefile