如何从 ruby 中的字符串中提取路径（在第一个和最后一个 fwd 斜杠之间）

Question

我一直在编写一个 ruby 脚本，它遍历一个文本文件并找到以输出路径开头的所有行并将其存储到该行的字符串 (linefromtextfile) 中。所以通常它定位如下行

"output_path":"/data/server/output/1/test_file.txt","text":
"output_path":"/data/server/output/2/test_file.txt","text":

我只想从行中提取路径名 (pathtokeep) 并写入文件，即：

/data/server/output/1/
/data/server/output/2/

我试过这个正则表达式，但它不起作用：

pathtokeep=linefromtextfile.split(?:$/.*?/)([^/]*?\.\S*)

请有人在我的 RegEx 上提出建议 - 拆分是正确的方法还是有更简单的方法？

Answer 1

试试这个正则表达式：

(?<="output_path":")(.*?)(?=")

Live Demo on Regex101

工作原理：

(?<="output_path":")     # Lookbehind for "output_path":"
(.*?)                    # Data inside "" (Lazy)
(?=")                    # Lookahead for closing "

Answer 2

如果您的文件具有始终相同的结构，您也可以在没有正则表达式的情况下完成。

line = '"output_path":"/data/server/output/1/test_file.txt","text":'

path = line.split(/:"|",/)[1]
# => "/data/server/output/1/test_file.txt"

basename = File.basename(path)
# => "test_file.txt"

File.dirname(path) + '/'
# => "/data/server/output/1/"

Answer 3

我建议尽可能使用 Ruby 方法，仅使用正则表达式从字符串中提取路径。

str = '"output_path":"/data/server/output/1/test_file.txt","text":'

r = /
    :"      # match a colon and double quote
    (.+?)   # match one or more of any character, lazily, in capture group 1 
    "       # match a double quote
    /x      # free-spacing regex definition mode

File.dirname(str[r,1])
  #=> "/data/server/output/1"

如果你真的想要结尾的正斜杠，

File.dirname(str[r,1]) << "/"
  #=> "/data/server/output/1/"

如果你需要，

File.basename(str[r,1])
  #=> "test_file.txt"

我会把它留给 OP 来读取和写入文件。

如果您坚持使用单个正则表达式，您可以这样写：

r = /
    (?<=:") # match a colon followed by a double-quote in a positive lookbehind
    .+      # match one more characters, greedily
    \/      # match a forward slash
    /x

str[r]
  #=> "/data/server/output/1/"

请注意，.+ 是贪婪的，会吞噬所有字符，直到到达字符串中的最后一个正斜杠。

如何从 ruby 中的字符串中提取路径（在第一个和最后一个 fwd 斜杠之间）

How to extract path from a string in ruby (between 1st and last fwd slash inclusive)

ruby

regex

string

pathname