如何检查没有紧跟关键字的词,没有被关键字包围的词呢?
How to check for words that are not immediately followed by a keyword, how about words not surrounded by the keyword?
我正在尝试查找未紧接在 the
之前的词。
执行正面回溯以获取关键字 'the' (?<=the\W)
之后的词。但是,我无法捕获 'people' 和 'that',因为上述逻辑不适用于这些情况。
我无法处理前后没有关键字'the'的词(例如句子中的'that'和'people')
p = re.compile(r'(?<=the\W)\w+')
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
当前得到的输出是
'part','fair','most','fireworks'.
编辑:
感谢您在下方提供的所有帮助。使用评论中的以下建议,设法更新我的代码。
p = re.compile(r"\b(?!the)(\w+)(\W\w+\Wthe)?")
m = p.findall('the part of the fair that attracts the most people is the fireworks')
这让我更接近我需要得到的输出。
更新后的输出:
[('part', ' of the'), ('fair', ''),
('that', ' attracts the'), ('most', ''),
('people', ' is the'), ('fireworks', '')]
我只需要字符串 ('part','fair','that','most','people','fireworks')。
有什么建议吗?
I am trying to look for words that do not immediately come before 'the' .
注意下面的代码没有使用re
.
words = 'the part of the fair that attracts the most people is the fireworks'
words_list = words.split()
words_not_before_the = []
for idx, w in enumerate(words_list):
if idx < len(words_list)-1 and words_list[idx + 1] != 'the':
words_not_before_the.append(w)
words_not_before_the.append(words_list[-1])
print(words_not_before_the)
输出
['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']
尝试旋转它,而不是找出没有紧跟在 the
之后的词,消除所有紧跟在 the
之后的词
import re
test = "the part of the fair that attracts the most people is the fireworks"
pattern = r"\s\w*\sthe|the\s"
print(re.sub(pattern, "", test))
输出:part fair that most people fireworks
使用正则表达式:
import re
m = re.sub(r'\b(\w+)\b the', 'the', 'the part of the fair that attracts the most people is the fireworks')
print([word for word in m.split(' ') if not word.isspace() and word])
输出:
['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']
I am trying to look for words that do not immediately come before the.
试试这个:
import re
# The capture group (\w+) matches a word, that is followed by a word, followed by the word: "the"
p = re.compile(r'(\w+)\W\w+\Wthe')
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
输出:
['part', 'that', 'people']
我终于解决了这个问题。谢谢大家!
p = re.compile(r"\b(?!the)(\w+)(?:\W\w+\Wthe)?")
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
在第三组中添加了一个 non-capturing 组 '?:'。
输出:
['part', 'fair', 'that', 'most', 'people', 'fireworks']
我正在尝试查找未紧接在 the
之前的词。
执行正面回溯以获取关键字 'the' (?<=the\W)
之后的词。但是,我无法捕获 'people' 和 'that',因为上述逻辑不适用于这些情况。
我无法处理前后没有关键字'the'的词(例如句子中的'that'和'people')
p = re.compile(r'(?<=the\W)\w+')
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
当前得到的输出是
'part','fair','most','fireworks'.
编辑:
感谢您在下方提供的所有帮助。使用评论中的以下建议,设法更新我的代码。
p = re.compile(r"\b(?!the)(\w+)(\W\w+\Wthe)?")
m = p.findall('the part of the fair that attracts the most people is the fireworks')
这让我更接近我需要得到的输出。
更新后的输出:
[('part', ' of the'), ('fair', ''),
('that', ' attracts the'), ('most', ''),
('people', ' is the'), ('fireworks', '')]
我只需要字符串 ('part','fair','that','most','people','fireworks')。 有什么建议吗?
I am trying to look for words that do not immediately come before 'the' .
注意下面的代码没有使用re
.
words = 'the part of the fair that attracts the most people is the fireworks'
words_list = words.split()
words_not_before_the = []
for idx, w in enumerate(words_list):
if idx < len(words_list)-1 and words_list[idx + 1] != 'the':
words_not_before_the.append(w)
words_not_before_the.append(words_list[-1])
print(words_not_before_the)
输出
['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']
尝试旋转它,而不是找出没有紧跟在 the
之后的词,消除所有紧跟在 the
import re
test = "the part of the fair that attracts the most people is the fireworks"
pattern = r"\s\w*\sthe|the\s"
print(re.sub(pattern, "", test))
输出:part fair that most people fireworks
使用正则表达式:
import re
m = re.sub(r'\b(\w+)\b the', 'the', 'the part of the fair that attracts the most people is the fireworks')
print([word for word in m.split(' ') if not word.isspace() and word])
输出:
['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']
I am trying to look for words that do not immediately come before the.
试试这个:
import re
# The capture group (\w+) matches a word, that is followed by a word, followed by the word: "the"
p = re.compile(r'(\w+)\W\w+\Wthe')
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
输出:
['part', 'that', 'people']
我终于解决了这个问题。谢谢大家!
p = re.compile(r"\b(?!the)(\w+)(?:\W\w+\Wthe)?")
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)
在第三组中添加了一个 non-capturing 组 '?:'。
输出:
['part', 'fair', 'that', 'most', 'people', 'fireworks']