Python 脚本以匹配忽略制表符和空格的实际行首

Question

我认为我的问题几乎是不言自明的，但我仍然会 post 举个例子更清楚。

我有以下完整的工作脚本 comment/uncomment 在 Gedit 编辑器中打开的 Javascript 文件中的行。

#! /usr/bin/env python
import sys
import StringIO
block = sys.stdin.read()
block = StringIO.StringIO(block)
msg = ''
for line in block:
    if "//~" in line:
        line = line.replace('//~','')
        msg = "All lines in selection uncommented"
    else:
        line = "//~" + line
        msg = "All lines in selection commented"
    sys.stdout.write(line)
exit(msg)

现在我想把 //~ 放在实际行首的前面（不是空格或制表符，而是当真正的行开始时，即字符和字符串）。

如果我像下面这样使用 regex 模块执行此操作，那么它会添加 //~ 两次，表示行首和实际行首。

#! /usr/bin/env python
import sys
import StringIO
import re
block = sys.stdin.read()
block = StringIO.StringIO(block)
msg = ''
for line in block:
    if "//~" in line:
        line = re.sub(r"(\s*)(\S.*)", r"//~", line)
        line = line.replace('//~','')
        msg = "All lines in selection uncommented"
    else:
        line = re.sub(r"(\s*)(\S.*)", r"//~", line)
        line = "//~" + line
        msg = "All lines in selection commented"
    sys.stdout.write(line)
exit(msg)

我该怎么做 with/without python 中的正则表达式？

Answer 1

您可以使用正则表达式替换来执行此操作。例如，这行代码应该做你想做的事

line = re.sub(r"^(\s*)(\S.*)", r"//~", line)

此正则表达式匹配 0 个或更多 space 个字符 [(\s*)]，然后匹配字符串的其余部分 [(\S.*)]。然后它用第一个捕获组 [\1]、spaces、两个斜杠 [//~]、然后字符串的其余部分 [\2].

替换它

Python 脚本以匹配忽略制表符和空格的实际行首

Python script to match actual start of line ignoring tabs and spaces

javascript

python

regex

gedit

pattern-matching