Python 字典替换为 space 键
Python dictionary replacement with space in key
我有一个字符串和一个字典,我必须替换该文本中每次出现的字典键。
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
如果keys里面没有space,我就把文字打成word,和dict一一比较。看起来花了 O(n)。但是现在钥匙里面有 space 所以事情更复杂了。请建议我执行此操作的好方法,请注意密钥可能与文本大小写不匹配。
更新
我想过这个解决方案,但效率不高。 O(m*n) 或更多...
for k,v in dict.iteritems():
text = text.replace(k,v) #or regex...
如果您的键没有空格:
output = [dct[i] if i in dct else i for i in text.split()]
' '.join(output)
您应该使用 dct 而不是 dict,这样它就不会与内置函数 dict() 发生冲突
这利用了 dictionary comprehension, and a ternary operator
过滤数据。
如果您的键确实有空格,那么您是正确的:
for k,v in dct.iteritems():
string.replace('d', dct[d])
是的,这次的时间复杂度将是 m*n,因为对于 dct 中的每个键,您每次都必须遍历字符串。
将所有的字典键和输入的文本都降为小写,这样比较就很容易了。现在...
for entry in my_dict:
if entry in text:
# process the match
这假定字典足够小以保证匹配。相反,如果词典很大而文本很小,则您需要提取每个单词,然后提取每个 2 词短语,看看它们是否在词典中。
这足以让你继续吗?
您需要测试从 1(每个单独的单词)到 len(text)(整个字符串)的所有相邻排列。您可以通过这种方式生成邻居排列:
text = 'I have a smartphone and a Smart TV'
array = text.lower().split()
key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))]
>>> key_permutations
['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv']
现在我们通过字典代入:
import re
for permutation in key_permutations:
if permutation in dict:
text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE)
>>> text
'I have a toy and a junk'
尽管您可能希望以相反的顺序尝试排列,最长的优先,因此更具体的短语优先于单个单词。
如果文本中的关键字彼此不接近(keyword other keyword)我们可能会这样做。给我 O(n) >"<
def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
"""
Replace word or word phrase in text with keyword in dictionary.
Arguments:
dictionary: dict with key:value, key should be in lower case
text: string to replace
strip_chars: string contain character to be strip out of each word
replace_func: function if exist will transform final replacement.
Must have 2 params as key and value
Return:
string
Example:
my_dict = {
"hello": "hallo",
"hallo": "hello", # Only one pass, don't worry
"smart tv": "http://google.com?q=smart+tv"
}
dict_replace(my_dict, "hello google smart tv",
replace_func=lambda k,v: '[%s](%s)'%(k,v))
"""
# First break word phrase in dictionary into single word
dictionary = dictionary.copy()
for key in dictionary.keys():
if ' ' in key:
key_parts = key.split()
for part in key_parts:
# Mark single word with False
if part not in dictionary:
dictionary[part] = False
# Break text into words and compare one by one
result = []
words = text.split()
words.append('')
last_match = '' # Last keyword (lower) match
original = '' # Last match in original
for word in words:
key_word = word.lower().strip(strip_chars) if \
strip_chars is not None else word.lower()
if key_word in dictionary:
last_match = last_match + ' ' + key_word if \
last_match != '' else key_word
original = original + ' ' + word if \
original != '' else word
else:
if last_match != '':
# If match whole word
if last_match in dictionary and dictionary[last_match] != False:
if replace_func is not None:
result.append(replace_func(original, dictionary[last_match]))
else:
result.append(dictionary[last_match])
else:
# Only match partial of keyword
match_parts = last_match.split(' ')
match_original = original.split(' ')
for i in xrange(0, len(match_parts)):
if match_parts[i] in dictionary and \
dictionary[match_parts[i]] != False:
if replace_func is not None:
result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
else:
result.append(dictionary[match_parts[i]])
result.append(word)
last_match = ''
original = ''
return ' '.join(result)
您可以使用正则表达式轻松完成此操作。
import re
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
for k, v in dict.iteritems():
regex = re.compile(re.escape(k), flags=re.I)
text = regex.sub(v, text)
如果一个项目的替换值是另一个项目的搜索词的一部分,它仍然会遇到依赖字典键的处理顺序的问题。
我有一个字符串和一个字典,我必须替换该文本中每次出现的字典键。
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
如果keys里面没有space,我就把文字打成word,和dict一一比较。看起来花了 O(n)。但是现在钥匙里面有 space 所以事情更复杂了。请建议我执行此操作的好方法,请注意密钥可能与文本大小写不匹配。
更新
我想过这个解决方案,但效率不高。 O(m*n) 或更多...
for k,v in dict.iteritems():
text = text.replace(k,v) #or regex...
如果您的键没有空格:
output = [dct[i] if i in dct else i for i in text.split()]
' '.join(output)
您应该使用 dct 而不是 dict,这样它就不会与内置函数 dict() 发生冲突
这利用了 dictionary comprehension, and a ternary operator 过滤数据。
如果您的键确实有空格,那么您是正确的:
for k,v in dct.iteritems():
string.replace('d', dct[d])
是的,这次的时间复杂度将是 m*n,因为对于 dct 中的每个键,您每次都必须遍历字符串。
将所有的字典键和输入的文本都降为小写,这样比较就很容易了。现在...
for entry in my_dict:
if entry in text:
# process the match
这假定字典足够小以保证匹配。相反,如果词典很大而文本很小,则您需要提取每个单词,然后提取每个 2 词短语,看看它们是否在词典中。
这足以让你继续吗?
您需要测试从 1(每个单独的单词)到 len(text)(整个字符串)的所有相邻排列。您可以通过这种方式生成邻居排列:
text = 'I have a smartphone and a Smart TV'
array = text.lower().split()
key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))]
>>> key_permutations
['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv']
现在我们通过字典代入:
import re
for permutation in key_permutations:
if permutation in dict:
text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE)
>>> text
'I have a toy and a junk'
尽管您可能希望以相反的顺序尝试排列,最长的优先,因此更具体的短语优先于单个单词。
如果文本中的关键字彼此不接近(keyword other keyword)我们可能会这样做。给我 O(n) >"<
def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
"""
Replace word or word phrase in text with keyword in dictionary.
Arguments:
dictionary: dict with key:value, key should be in lower case
text: string to replace
strip_chars: string contain character to be strip out of each word
replace_func: function if exist will transform final replacement.
Must have 2 params as key and value
Return:
string
Example:
my_dict = {
"hello": "hallo",
"hallo": "hello", # Only one pass, don't worry
"smart tv": "http://google.com?q=smart+tv"
}
dict_replace(my_dict, "hello google smart tv",
replace_func=lambda k,v: '[%s](%s)'%(k,v))
"""
# First break word phrase in dictionary into single word
dictionary = dictionary.copy()
for key in dictionary.keys():
if ' ' in key:
key_parts = key.split()
for part in key_parts:
# Mark single word with False
if part not in dictionary:
dictionary[part] = False
# Break text into words and compare one by one
result = []
words = text.split()
words.append('')
last_match = '' # Last keyword (lower) match
original = '' # Last match in original
for word in words:
key_word = word.lower().strip(strip_chars) if \
strip_chars is not None else word.lower()
if key_word in dictionary:
last_match = last_match + ' ' + key_word if \
last_match != '' else key_word
original = original + ' ' + word if \
original != '' else word
else:
if last_match != '':
# If match whole word
if last_match in dictionary and dictionary[last_match] != False:
if replace_func is not None:
result.append(replace_func(original, dictionary[last_match]))
else:
result.append(dictionary[last_match])
else:
# Only match partial of keyword
match_parts = last_match.split(' ')
match_original = original.split(' ')
for i in xrange(0, len(match_parts)):
if match_parts[i] in dictionary and \
dictionary[match_parts[i]] != False:
if replace_func is not None:
result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
else:
result.append(dictionary[match_parts[i]])
result.append(word)
last_match = ''
original = ''
return ' '.join(result)
您可以使用正则表达式轻松完成此操作。
import re
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
for k, v in dict.iteritems():
regex = re.compile(re.escape(k), flags=re.I)
text = regex.sub(v, text)
如果一个项目的替换值是另一个项目的搜索词的一部分,它仍然会遇到依赖字典键的处理顺序的问题。