查找分隔符前后的单词
Find word before and after delimiter
string = "The is a better :: sentence as :: compared to that"
输出:
- 更好的句子
- 比较
我尝试了以下方法,
string.split(" :: "),
re.sub("[\<].*?[\>]", "", string)
这些不会给我具体的词
>>> string = "The is a better :: sentence as :: compared to that"
>>> x = [' '.join(x) for x in map(lambda x: (x[0].split()[-1], x[1].split()[0]), zip(string.split('::')[:-1], string.split('::')[1:]))]
>>> x
输出:
['better sentence', 'as compared']
解剖:
首先,根据::
和zip分组连续匹配进行拆分
pairs = zip(string.split('::')[:-1], string.split('::')[1:]))
如果你list()
那个表达式,你会得到:
[('The is a better ', ' sentence as '), (' sentence as ', ' compared to that')]
接下来,应用一个函数从第一个元素中提取最后一个单词,从第二个元素中提取第一个单词每个元组:
new_pairs = map(lambda x: (x[0].split()[-1], x[1].split()[0]), pairs)
如果你list()
那个表达式,你会得到:
[('better', 'sentence'), ('as', 'compared')]
最后,将每个元组加入列表理解中:
result = [' '.join(x) for x in new_pairs]
输出:
['better sentence', 'as compared']
timeit
结果:
The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.74 µs per loop
这是 re
的另一种方式。
import re
string = "The is a better :: sentence as :: compared to that"
result = [' '.join(x) for x in re.findall('([\w]+) :: ([\w]+)', string)]
输出:
['better sentence', 'as compared']
timeit
结果:
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.49 µs per loop
这是另一种方式:
1st) 获取分隔符的索引
indices = [idx for idx, elem in enumerate(string.split(' ')) if elem == '::']
2nd) 加入分隔符左右的单词
for idx in indices:
print ' '.join(string.split(' ')[idx-1:idx+2:2])
'better sentence'
'as compared'
使用re.findall()
函数的解决方案:
s = "The is a better :: sentence as :: compared to that"
result = [' '.join(i) for i in re.findall(r'(\w+) ?:: ?(\w+)', s)]
print(result)
输出:
['better sentence', 'as compared']
string = "The is a better :: sentence as :: compared to that"
输出:
- 更好的句子
- 比较
我尝试了以下方法,
string.split(" :: "),
re.sub("[\<].*?[\>]", "", string)
这些不会给我具体的词
>>> string = "The is a better :: sentence as :: compared to that"
>>> x = [' '.join(x) for x in map(lambda x: (x[0].split()[-1], x[1].split()[0]), zip(string.split('::')[:-1], string.split('::')[1:]))]
>>> x
输出:
['better sentence', 'as compared']
解剖:
首先,根据::
和zip分组连续匹配进行拆分
pairs = zip(string.split('::')[:-1], string.split('::')[1:]))
如果你list()
那个表达式,你会得到:
[('The is a better ', ' sentence as '), (' sentence as ', ' compared to that')]
接下来,应用一个函数从第一个元素中提取最后一个单词,从第二个元素中提取第一个单词每个元组:
new_pairs = map(lambda x: (x[0].split()[-1], x[1].split()[0]), pairs)
如果你list()
那个表达式,你会得到:
[('better', 'sentence'), ('as', 'compared')]
最后,将每个元组加入列表理解中:
result = [' '.join(x) for x in new_pairs]
输出:
['better sentence', 'as compared']
timeit
结果:
The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.74 µs per loop
这是 re
的另一种方式。
import re
string = "The is a better :: sentence as :: compared to that"
result = [' '.join(x) for x in re.findall('([\w]+) :: ([\w]+)', string)]
输出:
['better sentence', 'as compared']
timeit
结果:
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.49 µs per loop
这是另一种方式:
1st) 获取分隔符的索引
indices = [idx for idx, elem in enumerate(string.split(' ')) if elem == '::']
2nd) 加入分隔符左右的单词
for idx in indices:
print ' '.join(string.split(' ')[idx-1:idx+2:2])
'better sentence'
'as compared'
使用re.findall()
函数的解决方案:
s = "The is a better :: sentence as :: compared to that"
result = [' '.join(i) for i in re.findall(r'(\w+) ?:: ?(\w+)', s)]
print(result)
输出:
['better sentence', 'as compared']