多行文件grep

Question

我有一个文件包含这样的部分，

flags...id, description, used, color
AB, "Abandoned", 0, 13168840
DM, "Demolished", 0, 15780518
OP, "Operational", 0, 15780518...

其中 ... 表示一系列控制字符，例如ETX 和 STX。我正在尝试从文件中获取多行。

我正在使用以下代码：

f = File.open(somePath)
r = f.grep(/flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/)

此代码无效。我不懂为什么。 grep 的文档似乎暗示该文件是逐行解析的。我感觉这可能是正则表达式未返回任何结果的原因。

我对 grep 使用逐行解析的说法是否正确？这就是我的正则表达式无法正常工作的原因吗？
使用file.each_line捕获数据会更好吗？
是否有 better/cleaner 以上所有的替代方案？

Answer 1

您需要启用多行模式。 . 默认不匹配换行符。

来自文档https://ruby-doc.org/core-2.1.1/Regexp.html

/./ - Any character except a newline.
/./m - Any character (the m modifier enables multiline mode)

Answer 2

Am I correct that grep uses line-by-line parsing?

是的。尝试你的文件：

r = File.open(somePath) do |f|
  f.grep(/[A-Z]{2},/)
end

puts r
# => AB, "Abandoned", 0, 13168840
#    DM, "Demolished", 0, 15780518
#    OP, "Operational", 0, 15780518

puts r.inspect
# => ["AB, \"Abandoned\", 0, 13168840\n", "DM, \"Demolished\", 0, 15780518\n", "OP, \"Operational\", 0, 15780518\n"]

Is this why my regex isn't working as intended?

不仅如此。您在使用 [\x00-\x08] 搜索什么？ ascii 字符还是十六进制字符？

Would it be better to use file.each_line to capture the data?

File#grep听起来不错。

Answer 3

String#scan 来救援：

File.read('/path/to/file').scan(
  /flags.+id, description, used, color(?<data>(?:.|\s)*?)[\x00-\x08]/m
)

多行文件grep

Multiline file grep

ruby

grep

multiline