从字符串中删除“#”注释（注释可能从字符串的中间 ta 行开始）

Question

我基本上是在努力从文件中删除评论（读取）并将其写入某个文件。单行注释可能在行的开头，也可能在中间。从注释开始到下一行的部分将被删除。

一些答案建议了下面提到的代码，但它不适用于一些有用代码之后出现的单行注释。我对 lex 有一些了解，所以我尝试修改代码来满足我的需要，但我被卡住了。请帮助。

import re
def stripComments(code):
    code = str(code)
    return re.sub(r'(?m)^ *#.*\n?', '', code)

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))

预期输出：

Why so Serious?

bar foo

实际输出：

Why so Serious? #This comment doesn't get removed

bar foo

[newline]

[newline]

Answer 1

试试这个：

import re
def stripComments(code):
    code = str(code)
    return re.sub(r'(#.*)?\n?', '', code)

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))
# Why so Serious? bar foo

Answer 2

我认为对您的字符串进行基本探索可以比使用 re 更好（更快）完成工作，这是一个工作示例：

def stripComments(code):
    codeWithoutComments = ""
    for i in code.splitlines():
        marker = False
        for j in i:
            if j == "#":
                marker = True
            if not marker:
                codeWithoutComments += j
        codeWithoutComments += "\n"
    return codeWithoutComments

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))

返回值：

"""
Why so Serious?
bar foo

"""

Answer 3

您的正则表达式有一个锚点 '^'，这意味着该模式只能从行首开始。没有这个它几乎可以工作。

您可能还想提前编译正则表达式，这样您就可以重复使用它而无需每次都编译：

COMMENT_PATTERN = re.compile('\s*#.*\n?', re.MULTILINE)


def strip_comments(code):
    return COMMENT_PATTERN.sub('', str(code))

我还用 '\s' 替换了 space ' '，它将匹配任何白色 space，例如制表符等。如果你不这样做，你应该把它放回去不喜欢

Answer 4

您可以使用 regex101.com 调试您的正则表达式并查看它实际匹配的内容。

(?m) 更改匹配规则，使 ^ 匹配一行的开头，而不是整个字符串的开头

^ * 匹配行的开头，后跟任意数量的 space 个字符。（所以希望没有任何标签！）

用简单的英语来说，您的正则表达式仅匹配行首或任意数量 space 之后的 Python 条评论。

其他答案已经提供了正则表达式来做你想做的，我就不再重复了。

从字符串中删除“#”注释（注释可能从字符串的中间 ta 行开始）

Remove '#' comments from a string (the comment may start from in-between ta line of the string)

python

lex