Python 排序并删除列表中的重复项 re.sub
Python sort and delete duplicates in list an use re.sub
我是 Python 的新手。
我尝试制作模拟 bash 命令:cat domains.txt |sort -u|sed 's/^*.//g' > domains2.txt
文件域包含带和不带掩码前缀 *.
的域列表,例如:
*.example.com
example2.org
大约 300k+ 行
我写了这段代码:
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
line = line.replace('*.', "")
fout.write(line)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
按计划削减*.
,排序列表,但不删除重复行
我曾建议使用 re.sub 而不是 replace 来使模式更严格(就像在 sed 中,我从行首开始这样做),但是当我尝试这样做时:
import re
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
newline = re.sub('^*.', '', line)
fout.write(newline)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
它只是对错误不起作用,我不明白。
在正则表达式中,*
、.
等都是特殊字符。您应该转义它们才能使用它们。
import re
s = "*.example.com"
re.sub(r'^\*\.', '', s)
> 'example.com'
我是 Python 的新手。
我尝试制作模拟 bash 命令:cat domains.txt |sort -u|sed 's/^*.//g' > domains2.txt
文件域包含带和不带掩码前缀 *.
的域列表,例如:
*.example.com
example2.org
大约 300k+ 行
我写了这段代码:
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
line = line.replace('*.', "")
fout.write(line)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
按计划削减*.
,排序列表,但不删除重复行
我曾建议使用 re.sub 而不是 replace 来使模式更严格(就像在 sed 中,我从行首开始这样做),但是当我尝试这样做时:
import re
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
newline = re.sub('^*.', '', line)
fout.write(newline)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
它只是对错误不起作用,我不明白。
在正则表达式中,*
、.
等都是特殊字符。您应该转义它们才能使用它们。
import re
s = "*.example.com"
re.sub(r'^\*\.', '', s)
> 'example.com'