如何编写正则表达式来匹配 elisp 中最长的候选者?

How to write regexp to match the longest candidate in elisp?

我正在尝试编写一个函数来从字符串中删除后缀。后缀如下:

agent_pkg
agent
pkg
driver
abs_if
abs_if_pkg
if_pkg
if

测试字符串:

test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if

从上面的测试字符串中,我期望从中得到test_blah

我写了这样一个函数:

(defun get-base-name (name)
  "Get the base name from string."
  (setq s (substring-no-properties name))
  (string-match "\(.*\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)  
  (match-string 1 s))

但它总是只匹配短候选词。我从 (get-base-name "test_blah_abs")

得到 test_blah_abs

.* 是贪婪的¹,这意味着只要字符串与正则表达式匹配,它就会尝试覆盖尽可能多的内容。你想让它不贪婪,一旦找到匹配就停止。在 *+ 之后添加 ? 使其成为非贪婪的。比较:

(let ((s "abcabcabc"))
  (string-match ".*c" s)
  (match-string 0 s)) ; => "abcabcabc"
(let ((s "abcabcabc"))
  (string-match ".*?c" s)
  (match-string 0 s)) ; => "abc"

.*?.* 的非贪婪版本,因此只需添加 ? 即可:

(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
  (string-match "\(.*?\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)
  (match-string 1 s)) ; => "test_blah"

FYI, third-party string manipulation library s has plenty of string functions that you mind useful instead of relying on regular expressions all the time. E.g. s-shared-start可以找到2个字符串的共同前缀:

(s-shared-start "test_blah_agent" "test_blah_pkg") ; "test_blah_"

结合s-lines, which breaks a string into a list of strings by newline character, and -reduce function from the amazing third-party list manipulation library dash,可以发现每个字符串都有一个共同的前缀:

(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
  (-reduce 's-shared-start (s-lines s))) ; => "test_blah_"

¹ 阅读 under section Greediness 以了解此概念。