如何解析包含引号的制表符分隔行?
How do I parse a tab-delimited line that contains a quote?
我正在使用 Ruby 2.4。如何解析包含引号字符的制表符分隔行?这就是我现在正在发生的事情......
2.4.0 :003 > line = "11\tDave\tO\"malley"
=> "11\tDave\tO\"malley"
2.4.0 :004 > CSV.parse(line, col_sep: "\t")
CSV::MalformedCSVError: Illegal quoting in line 1.
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1912:in `block (2 levels) in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `block in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `loop'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1770:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `to_a'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `read'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1324:in `parse'
from (irb):4
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
虽然这个例子说明了我的观点,但我无法轻易控制输入。因此,尽管答案可能是< "Remove all quotes from teh string before parsing,"我想尽可能地保留数据。
如果您试图遵守 CSV 标准,那么这是一份格式错误的文档。 Instad 你可能只是 brute-force 它并祈祷数据本身没有标签:
line.split(/\t/)
当您处理这样的数据时,CSV 解析库会派上用场:
"1\t2\t\"3a\t3b\"\t4"
更新: 如果您准备稍微滥用 CSV 库,那么您可以这样做:
CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "[=12=]")
这基本上会扼杀引号检测,因此如果有其他数据依赖于被正确处理的数据,这可能无法解决。
"11\tDave\tO\"malley" 不是有效的 CSV 数据。奇怪的是,答案是使用两个 double-quotes,并对每个元素加双引号
2.3.1 :001 > require 'csv'
=> true
2.3.1 :002 > line = "\"11\"\t\"Dave\"\t\"O\"\"malley\""
=> "\"11\"\t\"Dave\"\t\"O\"\"malley\""
2.3.1 :003 > puts line # for clarity
"11" "Dave" "O""malley"
=> nil
2.3.1 :004 > CSV.parse(line, col_sep: "\t")
=> [["11", "Dave", "O\"malley"]]
我正在使用 Ruby 2.4。如何解析包含引号字符的制表符分隔行?这就是我现在正在发生的事情......
2.4.0 :003 > line = "11\tDave\tO\"malley"
=> "11\tDave\tO\"malley"
2.4.0 :004 > CSV.parse(line, col_sep: "\t")
CSV::MalformedCSVError: Illegal quoting in line 1.
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1912:in `block (2 levels) in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `block in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `loop'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1770:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `to_a'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `read'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1324:in `parse'
from (irb):4
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
虽然这个例子说明了我的观点,但我无法轻易控制输入。因此,尽管答案可能是< "Remove all quotes from teh string before parsing,"我想尽可能地保留数据。
如果您试图遵守 CSV 标准,那么这是一份格式错误的文档。 Instad 你可能只是 brute-force 它并祈祷数据本身没有标签:
line.split(/\t/)
当您处理这样的数据时,CSV 解析库会派上用场:
"1\t2\t\"3a\t3b\"\t4"
更新: 如果您准备稍微滥用 CSV 库,那么您可以这样做:
CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "[=12=]")
这基本上会扼杀引号检测,因此如果有其他数据依赖于被正确处理的数据,这可能无法解决。
"11\tDave\tO\"malley" 不是有效的 CSV 数据。奇怪的是,答案是使用两个 double-quotes,并对每个元素加双引号
2.3.1 :001 > require 'csv'
=> true
2.3.1 :002 > line = "\"11\"\t\"Dave\"\t\"O\"\"malley\""
=> "\"11\"\t\"Dave\"\t\"O\"\"malley\""
2.3.1 :003 > puts line # for clarity
"11" "Dave" "O""malley"
=> nil
2.3.1 :004 > CSV.parse(line, col_sep: "\t")
=> [["11", "Dave", "O\"malley"]]