如何使用字典中的键来搜索字符串?
How to use keys from a dictionary to search for strings?
我正在编写一个编辑文本文件的程序。我打算让程序查找重复的字符串并删除 n - 1 行相似的字符串。
这是我目前的脚本:
import re
fname = raw_input("File name - ")
fhand = open(fname, "r+")
fhand.read()
counts = {}
pattern = re.compile(pattern)
# This searches the file for duplicate strings and inserts them into a dictionary with a counter
# as the value
for line in fhand:
for match in pattern.findall(line):
counts.setdefault(match, 0)
counts[match] += 1
pvar = {}
#This creates a new dictionary which contains all of the keys in the previous dictionary with
# count > 1
for match, count in counts.items():
if count > 1:
pvar[match] = count
fhand.close()
count = 0
# Here I am trying to delete n - 1 instances of each string that was a key in the previous
# dictionary
with open(fname, 'r+') as fhand:
for line in fhand:
for match, count in pvar.items():
if re.search(match, line) not in line:
continue
count += 1
else:
fhand.write(line)
print count
fhand.close()
如何让最后一段代码工作?是否可以使用字典中的键来识别相关行并删除 n-1 个实例?
还是我完全错了?
编辑:来自文件的样本,这应该是一个列表,每个 'XYZ' 实例都在一个换行符上,前面有两个空白字符。格式有点乱,请见谅
输入
-=XYZ[0:2] &
-=XYZ[0:2] &
-=XYZ[3:5] &
=XYZ[6:8] &
=XYZ[9:11] &
=XYZ[12:14] &
-=XYZ[15:17] &
=XYZ[18:20] &
=XYZ[21:23] &
输出
=XYZ[0:2]
编辑
此外,谁能解释为什么代码的最后一部分没有 return 任何内容?
这里是不使用正则表达式,使用字典的东西(所以行是无序的,可能无关紧要...):
#!/usr/bin/env python
import os
res = {}
with open("input.txt") as f:
for line in f.readlines():
line = line.strip()
key = line.split('[')[0].replace('-','').replace('=', '')
if key in res:
continue
res[key] = line
# res[key] = line.replace('&', '').strip()
print os.linesep.join(res.values())
这并没有去掉尾随的 & 号。如果你想摆脱它取消注释:
res[key] = line.replace('&', '').strip()
我正在编写一个编辑文本文件的程序。我打算让程序查找重复的字符串并删除 n - 1 行相似的字符串。
这是我目前的脚本:
import re
fname = raw_input("File name - ")
fhand = open(fname, "r+")
fhand.read()
counts = {}
pattern = re.compile(pattern)
# This searches the file for duplicate strings and inserts them into a dictionary with a counter
# as the value
for line in fhand:
for match in pattern.findall(line):
counts.setdefault(match, 0)
counts[match] += 1
pvar = {}
#This creates a new dictionary which contains all of the keys in the previous dictionary with
# count > 1
for match, count in counts.items():
if count > 1:
pvar[match] = count
fhand.close()
count = 0
# Here I am trying to delete n - 1 instances of each string that was a key in the previous
# dictionary
with open(fname, 'r+') as fhand:
for line in fhand:
for match, count in pvar.items():
if re.search(match, line) not in line:
continue
count += 1
else:
fhand.write(line)
print count
fhand.close()
如何让最后一段代码工作?是否可以使用字典中的键来识别相关行并删除 n-1 个实例? 还是我完全错了?
编辑:来自文件的样本,这应该是一个列表,每个 'XYZ' 实例都在一个换行符上,前面有两个空白字符。格式有点乱,请见谅 输入
-=XYZ[0:2] &
-=XYZ[0:2] &
-=XYZ[3:5] &
=XYZ[6:8] &
=XYZ[9:11] &
=XYZ[12:14] &
-=XYZ[15:17] &
=XYZ[18:20] &
=XYZ[21:23] &
输出
=XYZ[0:2]
编辑
此外,谁能解释为什么代码的最后一部分没有 return 任何内容?
这里是不使用正则表达式,使用字典的东西(所以行是无序的,可能无关紧要...):
#!/usr/bin/env python
import os
res = {}
with open("input.txt") as f:
for line in f.readlines():
line = line.strip()
key = line.split('[')[0].replace('-','').replace('=', '')
if key in res:
continue
res[key] = line
# res[key] = line.replace('&', '').strip()
print os.linesep.join(res.values())
这并没有去掉尾随的 & 号。如果你想摆脱它取消注释:
res[key] = line.replace('&', '').strip()