如何编写正则表达式来匹配 elisp 中最长的候选者?
How to write regexp to match the longest candidate in elisp?
我正在尝试编写一个函数来从字符串中删除后缀。后缀如下:
agent_pkg
agent
pkg
driver
abs_if
abs_if_pkg
if_pkg
if
测试字符串:
test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if
从上面的测试字符串中,我期望从中得到test_blah
。
我写了这样一个函数:
(defun get-base-name (name)
"Get the base name from string."
(setq s (substring-no-properties name))
(string-match "\(.*\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)
(match-string 1 s))
但它总是只匹配短候选词。我从 (get-base-name "test_blah_abs")
得到 test_blah_abs
.*
是贪婪的¹,这意味着只要字符串与正则表达式匹配,它就会尝试覆盖尽可能多的内容。你想让它不贪婪,一旦找到匹配就停止。在 *
或 +
之后添加 ?
使其成为非贪婪的。比较:
(let ((s "abcabcabc"))
(string-match ".*c" s)
(match-string 0 s)) ; => "abcabcabc"
(let ((s "abcabcabc"))
(string-match ".*?c" s)
(match-string 0 s)) ; => "abc"
.*?
是 .*
的非贪婪版本,因此只需添加 ?
即可:
(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
(string-match "\(.*?\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)
(match-string 1 s)) ; => "test_blah"
FYI, third-party string manipulation library s
has plenty of string functions that you mind useful instead of relying on regular expressions all the time. E.g. s-shared-start
可以找到2个字符串的共同前缀:
(s-shared-start "test_blah_agent" "test_blah_pkg") ; "test_blah_"
结合s-lines
, which breaks a string into a list of strings by newline character, and -reduce
function from the amazing third-party list manipulation library dash
,可以发现每个字符串都有一个共同的前缀:
(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
(-reduce 's-shared-start (s-lines s))) ; => "test_blah_"
¹ 阅读 under section Greediness 以了解此概念。
我正在尝试编写一个函数来从字符串中删除后缀。后缀如下:
agent_pkg
agent
pkg
driver
abs_if
abs_if_pkg
if_pkg
if
测试字符串:
test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if
从上面的测试字符串中,我期望从中得到test_blah
。
我写了这样一个函数:
(defun get-base-name (name)
"Get the base name from string."
(setq s (substring-no-properties name))
(string-match "\(.*\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)
(match-string 1 s))
但它总是只匹配短候选词。我从 (get-base-name "test_blah_abs")
test_blah_abs
.*
是贪婪的¹,这意味着只要字符串与正则表达式匹配,它就会尝试覆盖尽可能多的内容。你想让它不贪婪,一旦找到匹配就停止。在 *
或 +
之后添加 ?
使其成为非贪婪的。比较:
(let ((s "abcabcabc"))
(string-match ".*c" s)
(match-string 0 s)) ; => "abcabcabc"
(let ((s "abcabcabc"))
(string-match ".*?c" s)
(match-string 0 s)) ; => "abc"
.*?
是 .*
的非贪婪版本,因此只需添加 ?
即可:
(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
(string-match "\(.*?\)_\(agent_pkg\|agent\|driver\|abs_if\|if\|pkg\)" s)
(match-string 1 s)) ; => "test_blah"
FYI, third-party string manipulation library s
has plenty of string functions that you mind useful instead of relying on regular expressions all the time. E.g. s-shared-start
可以找到2个字符串的共同前缀:
(s-shared-start "test_blah_agent" "test_blah_pkg") ; "test_blah_"
结合s-lines
, which breaks a string into a list of strings by newline character, and -reduce
function from the amazing third-party list manipulation library dash
,可以发现每个字符串都有一个共同的前缀:
(let ((s "test_blah_agent_pkg
test_blah_agent
test_blah_pkg
test_blah_driver
test_blah_abs_if
test_blah_abs_if_pkg
test_blah_if_pkg
test_blah_if"))
(-reduce 's-shared-start (s-lines s))) ; => "test_blah_"
¹ 阅读 under section Greediness 以了解此概念。