为什么这种消极的看法是错误的？

Question

def get_hashtags(post)
    tags = []
    post.scan(/(?<![0-9a-zA-Z])(#+)([a-zA-Z]+)/){|x,y| tags << y}
    tags
end

Test.assert_equals(get_hashtags("two hashs##in middle of word#"), [])
#Expected: [], instead got: ["in"]

它不应该回头看看匹配项是否以单词或数字开头吗？为什么它仍然接受 'in' 作为有效匹配项？

Answer 1

你应该使用 \K 而不是消极的回顾。这使您可以大大简化正则表达式：不需要预定义的数组、捕获组或块。

\K 表示 "discard everything matched so far"。这里的关键是可变长度匹配可以在 \K 之前，而（在 Ruby 和大多数其他语言中）在（负或正）lookbehinds 中不允许可变长度匹配。

r = /
    [^0-9a-zA-Z#] # do not match any character in the character class
    \#+           # match one or more pound signs
    \K            # discard everything matched so far
    [a-zA-Z]+     # match one or more letters
    /x            # extended mode

请注意，如果我不是在扩展模式下编写正则表达式，则 \#+ 中的 # 不需要转义。

"two hashs##in middle of word#".scan r
  #=> []

"two hashs&#in middle of word#".scan r
  #=> ["in"]

"two hashs#in middle of word&#abc of another word.###def ".scan r
   #=> ["abc", "def"]

为什么这种消极的看法是错误的？

Why is this negative look behind wrong?

ruby

regex

negative-lookbehind