Carrierwave:上传前清理 CSV

Carrierwave: clean CSV before upload

我有一个 Carrierwave 上传器 InventoryUploader,我用它来存储 text/csvapplication/vnd.ms-excel 格式的文件。我在处理文件时遇到了一些无效的字节序列错误,所以我想在上传之前清除它们。

我已经在 Carrierwave 上传器上创建了一个 process :clean_file 回调,但我正在努力寻找一种在上传之前清除所有 BOM/时髦字符的好方法。

有什么好的方法可以做到这一点吗?我正在尝试类似的东西:

class InventoryUploader < CarrierWave::Uploader::Base
  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{model.id}"
  end  

  def content_type_whitelist
    %w(text/csv application/vnd.ms-excel)
  end

  process :clean_file

  private

  def clean_file
    # Perhaps something like this?
    # but how can I ensure it removes all weird
    # character sequences?
    CSV.foreach(file.path, headers: true, encoding: 'UTF-8') do |row|
      row.each do |field|
        field[1].gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
      end
    end
  end
end

提前致谢!

能够使用以下代码完成:

require 'tempfile'

class InventoryUploader < CarrierWave::Uploader::Base
  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{model.id}"
  end

  def content_type_whitelist
    %w(text/csv application/vnd.ms-excel)
  end

  process :clean_file

  private

  def clean_file
    tempfile = Tempfile.open('temp_csv', encoding: 'utf-8') do |f|
      File.foreach(current_path) do |line|
        f.puts line.encode('UTF-8', invalid: :replace, undef: :replace)
      end
      f
    end
    FileUtils.mv(tempfile.path, current_path)
  end
end