如何根据来自不同列表的单词匹配拆分字符串?
How to split a string based on word match from different lists?
我有一个字符串。现在,如果两个不同列表中的任何匹配项,我想将字符串拆分成多个部分。我怎样才能做到这一点 ?我有什么。
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
for e in dummy_type:
if e in dummy_word:
type_found = e
print("type ->" , e)
dum = dummy_word.split(e)
complete_dum = "".join(dum)
for c in dummy_file_type:
if c in complete_dum:
then = complete_dum.split("c")
print("file type ->",then)
在给定的场景中,我的预期输出是 ["I have a", "HTML","file"]
这对我有用:
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
temp = ""
dummy_list = []
for word in dummy_word.split():
if word in dummy_type or word in dummy_file_type:
if temp:
dummy_list.append(temp)
print(temp, "delete")
print(temp)
new_word = word + " "
dummy_list.append(new_word)
temp = ""
else:
temp += word + " "
print(temp)
print(dummy_list)
itertools.groupby()
可以很好地处理这类任务。如果单词在单词集中,则密钥将在此处转换为单个单词,否则将转换为 False
。这允许所有非特殊词组合在一起,每个特殊词成为它自己的元素:
from itertools import groupby
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
words = set(dummy_type).union(dummy_file_type)
[" ".join(g) for k, g in
groupby(dummy_word.split(), key=lambda word: (word in words) and word)]
# ['I have a', 'HTML', 'file']
使用 re
的另一种方法:
>>> list(map(str.strip, re.sub("|".join(dummy_type + dummy_file_type), lambda x: "," + x.group(), dummy_word).split(',')))
['I have a', 'HTML', 'file']
>>>
首先,通过使用 join
连接所有类型来形成正则表达式模式。使用 re.sub
,字符串被替换为逗号前缀的标记,然后我们使用逗号分隔符拆分字符串。 map
用于去除空格。
我有一个字符串。现在,如果两个不同列表中的任何匹配项,我想将字符串拆分成多个部分。我怎样才能做到这一点 ?我有什么。
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
for e in dummy_type:
if e in dummy_word:
type_found = e
print("type ->" , e)
dum = dummy_word.split(e)
complete_dum = "".join(dum)
for c in dummy_file_type:
if c in complete_dum:
then = complete_dum.split("c")
print("file type ->",then)
在给定的场景中,我的预期输出是 ["I have a", "HTML","file"]
这对我有用:
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
temp = ""
dummy_list = []
for word in dummy_word.split():
if word in dummy_type or word in dummy_file_type:
if temp:
dummy_list.append(temp)
print(temp, "delete")
print(temp)
new_word = word + " "
dummy_list.append(new_word)
temp = ""
else:
temp += word + " "
print(temp)
print(dummy_list)
itertools.groupby()
可以很好地处理这类任务。如果单词在单词集中,则密钥将在此处转换为单个单词,否则将转换为 False
。这允许所有非特殊词组合在一起,每个特殊词成为它自己的元素:
from itertools import groupby
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
words = set(dummy_type).union(dummy_file_type)
[" ".join(g) for k, g in
groupby(dummy_word.split(), key=lambda word: (word in words) and word)]
# ['I have a', 'HTML', 'file']
使用 re
的另一种方法:
>>> list(map(str.strip, re.sub("|".join(dummy_type + dummy_file_type), lambda x: "," + x.group(), dummy_word).split(',')))
['I have a', 'HTML', 'file']
>>>
首先,通过使用 join
连接所有类型来形成正则表达式模式。使用 re.sub
,字符串被替换为逗号前缀的标记,然后我们使用逗号分隔符拆分字符串。 map
用于去除空格。