在 ruby 中递归匹配两行之间的所有行

Matching all lines between two lines recursively in ruby

我想匹配以 'SLX-' 开头的两行之间的所有行(包括第一行),将它们转换为逗号分隔行,然后将它们附加到文本文件中。

原始文本文件的截断版本如下所示:

SLX-9397._TC038IV_L_FLD0214.Read1.fq.gz
Sequences: 1406295
With index: 1300537
Sufficient length: 1300501
Min index: 0
Max index: 115
0       1299240
1       71
2       1
4       1
Unique: 86490
# reads processed: 86490
# reads with at least one reported alignment: 27433 (31.72%)
# reads that failed to align: 58544 (67.69%)
# reads with alignments suppressed due to -m: 513 (0.59%)
Reported 27433 alignments to 1 output stream(s)
SLX-9397._TC044II_D_FLD0197.Read1.fq.gz
Sequences: 308905
With index: 284599
Sufficient length: 284589
Min index: 0
Max index: 114
0       284290
1       16
Unique: 32715
# reads processed: 32715
# reads with at least one reported alignment: 13114 (40.09%)
# reads that failed to align: 19327 (59.08%)
# reads with alignments suppressed due to -m: 274 (0.84%)
Reported 13114 alignments to 1 output stream(s)
SLX-9397._TC047II_D_FLD0220.Read1.fq.gz

我想 ruby 看起来像

  1. 将带有 SLX- 的两行之间的所有 /n 转换为逗号
  2. 将原始文本文件另存为新的文本文件(最好是 CSV 文件。

我想我对如何在两个特定行之间查找和替换有特别的疑问。

我想我可以在不使用 ruby 的情况下做到这一点,但看到我正在尝试进入 Ruby...

我不太了解 Ruby 但这应该有用。您应该将整个文件读入 Sting。使用此正则表达式 - (\RSLX-) - 匹配所有 SLX-(除第一个以外的所有)并将其替换为 ,SLX-。正则表达式的解释,去https://regex101.com/r/pP3pP3/1

这个问题 - Ruby replace string with captured regex pattern - 可能会帮助您了解如何替换 ruby

假设,你的字符串在 str:

require 'csv'
CSV.open("/tmp/file.csv", "wb") do |csv|
  str.scan(/^(SLX-.*?)(?=\R+SLX-)/m).map do |s| # break by SLX-
    s.first.split($/).map do |el|               # split by CR
      "'#{el}'"                                 # quote values
    end                           
  end.each do |line|                            # iterate
    csv << line                                 # fulfil csv
  end
end