从 CSV 文件中选择单个值字段

Question

我想从我的 CSV 文件中的单个字段中提取特定值，但我所做的任何研究都指向使用散列来提取整列数据而不是值。

Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!

例如，我希望提取值 2548 和 4352 以添加并合并到新行的总计中。

我用过：

CSV.foreach("file.csv") { |row| col_data_new << row[5] }

将列中的值提取到数组中，但这次我只想要一个值。

Answer 1

是的，哈希是可行的方法：

require 'csv'

data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'

CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]

假装

CSV.parse(data, headers: :first_row)

真的

CSV.foreach('some/file.csv', headers: :first_row)

而且数据确实在文件中。

您想使用 headers: :first_row 的原因是告诉 CSV 吞噬第一行。然后它将 return 每个记录的散列，使用关联的 header 字段作为键，从而更容易按名称检索特定字段。

来自 the documentation:

:headers

If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers.

执行此操作的替代方法是：

spent = CSV.parse(data).map{ |row| row[2] }
spent.shift

spent
# => ["2548", "5054", "4352", "6542"]

spent.shift 删除数组中的第一个元素，即该列的 header 字段，使数组仅包含值。

或者：

spent = []
skip_headers = true
CSV.parse(data).each do |row| 

  if skip_headers
    skip_headers = false
    next
  end

  spent << row[2]
end

spent
# => ["2548", "5054", "4352", "6542"]

类似于上面的 shift 语句，next 告诉 Ruby 跳到循环的下一次迭代并且不处理块中的其余指令，这导致 header 记录在最终输出中被跳过。

从所需字段中获得值后，您可以有选择地提取特定值。如果你想要值“2548”和“4352”，你必须有一种方法来确定它们在哪一行。使用数组（non-header 方法）使它更难做，所以我会做它再次使用哈希：

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when 'Dan', 'Carlos'
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

请注意，在代码中很清楚发生了什么，哪些很重要。使用 case 和 when 可以让我轻松添加要包含的其他名称。 的行为 就像对 if 语句的链式 "or" 条件测试，但没有额外的噪音。

each_with_object is similar to inject，除了当我们需要将值聚合到数组、哈希或某些 object.

时它更清晰

对数组求和很容易，有很多不同的方法，但我会使用：

spent.map(&:to_i).inject(:+) # => 6900

基本上是将单个元素转换为整数并将它们相加。（还有更多内容，但这并不重要，直到你的学习曲线更上一层楼。）

I am just wondering if it is possible to replace the contents of the 'when' condition with an array of strings to iterate over rather than hard coded strings?

这是一个使用数组的解决方案：

NAMES = %w[Dan Carlos]

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when *NAMES
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

如果名称列表很大，我认为此解决方案运行会比必要的慢。数组非常适合存储您将要访问的数据，如 queue，或者像堆栈一样记住它们的顺序，但是当您必须遍历它才能找到某些东西时，它们就很糟糕了。即使是排序的数组并使用二分查找也可能比使用哈希慢，因为使用它们涉及额外的步骤。这是执行此操作的另一种方法，但使用哈希：

NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case
  when NAMES[row['Name']]
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

但这可以重构为更具可读性：

NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  ary << row['Total $ spent'] if NAMES[row['Name']]
end

spent
# => ["2548", "4352"]

从 CSV 文件中选择单个值字段

Selecting a single value field from CSV file

ruby

csv