如何理解 gsub(/^.*\//, '') 或正则表达式

Question

分解以下代码以理解我的正则表达式和gsub理解：

str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb

^ : 字符串的开头

\/ : /

的转义字符

^.*\/ ：字符串中 / 从开始到最后一次出现的所有内容

我的理解对吗？

.* 是如何工作的？

Answer 1

是的。简而言之，它匹配以文字 / (\/).

结尾的任意数量的任意字符 (.*)

gsub 用第二个参数（空字符串 ''）替换匹配项。

Answer 2

你的理解是正确的，但你也应该注意最后的说法是正确的，因为：

Repetition is greedy by default: as many occurrences as possible 
are matched while still allowing the overall match to succeed.

引自 Regexp 文档。

Answer 3

不，不完全是。

^: 行首
\/: 转义斜杠（转义字符是单独的 \）
^.*\/ ：从行首到字符串中最后一次出现 / 的所有内容

.* 取决于正则表达式的模式。在单行模式下（即没有 m 选项），它表示零个或多个非换行符的最长可能序列。在多行模式下（即使用 m 选项），它表示零个或多个字符的最长可能序列。

Answer 4

您的大致理解是正确的。整个正则表达式将匹配 abc/def/ 并且 String#gsub 将用空字符串替换它。

但是，请注意 String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!。

至于 .* 的工作原理——正则表达式引擎使用的算法称为 backtracking。由于 .* 是贪心的（会尝试匹配尽可能多的字符），你可以认为会发生这样的事情：

Step 1: .* matches the entire string abc/def/ghi.rb. Afterwards \/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.
Step 2: .* matches the entire string except the last character - abc/def/ghi.r. Afterwards \/ tries to match a forward slash, but fails (/ != b). .* has to backtrack.
Step 3: .* matches the entire string except the last two characters - abc/def/ghi.. Afterwards \/ tries to match a forward slash, but fails (/ != r). .* has to backtrack.
...
Step n: .* matches abc/def. Afterwards \/ tries to match a forward slash and succeeds. The matching ends here.

Answer 5

Nothing wrong with your regex, but File.basename(str) might be more appropriate.

详细说明@Stefen 所说的内容：看起来您确实在处理文件路径，这使您的问题成为 XY 问题，您在应该询问 X 时询问 Y：而不是如何使用和理解正则表达式，问题应该是用什么工具来管理路径。

与其滚动自己的代码，不如使用语言附带的已编写代码：

str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]

您想利用 File 的内置代码的原因是它考虑了 *nix 风格 OSes 和 Windows 中目录分隔符之间的差异。在启动时，Ruby 检查 OS 并将 File::SEPARATOR 常量设置为 OS 需要的值：

File::SEPARATOR # => "/"

如果您的代码从一个系统移动到另一个系统，如果您使用内置方法，它将继续工作，而使用正则表达式将立即中断，因为分隔符是错误的。

如何理解 gsub(/^.*\//, '') 或正则表达式

How to understand gsub(/^.*\//, '') or the regex

ruby

regex

gsub