根据限制从右开始用单词拆分字符串
Splitting strings with words from right based on limit
我有这个列表,由一串标签和权重组成:
lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175',
'electronic 46270', 'female vocalists 42565', 'favorites 39921',
'Love 34901', 'dance 33618', '00s 31432']
我正在尝试将其转换为元组,例如:
[('rock ', '101071'), ('pop ', '69159'), ('alternative ', '55777'), ('indie ', '48175'),
('electronic ', '46270'), ('female vocalists ', '42565'), ('favorites ', '39921'),
('Love ', '34901'), ('dance ', '33618'), ('s ', '0031432')]
在这里,每个字符串都被拆分为元组,使得每个元素的索引 0 包含除最后一个单词之外的单词,索引 1 处的元素包含字符串的最后一个单词。
为了实现这个,我的代码是:
tags=[]
weights=[]
for i in lst:
tag = ''.join([x for x in i if not x.isdigit()])
tags.append(tag)
weight = ''.join([x for x in i if x.isdigit()])
weights.append(weight)
然后,如果我这样做:
print zip(tags, weights)
我得到了想要的结果。但不幸的是,有些标签本身由数字组成,例如 lst
.
中的 00's
如何正确格式化 ('00s ', '0031432')
?
PS:作为备选的拆分方式,i.split("")
并不理想,因为集合中有些标签词数较多。
您可以使用str.rsplit()
将字符串以space为基础拆分为maxsplit
作为1。例如:
>>> lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175', 'electronic 46270', 'female vocalists 42565', 'favorites 39921', 'Love 34901', 'dance 33618', '00s 31432']
>>> [s.rsplit(' ', 1) for s in lst]
[['rock', '101071'], ['pop', '69159'], ['alternative', '55777'], ['indie', '48175'], ['electronic', '46270'], ['female vocalists', '42565'], ['favorites', '39921'], ['Love', '34901'], ['dance', '33618'], ['00s', '31432']]
但这将是嵌套列表的列表(我认为应该没问题)。但是,如果必须像问题中提到的那样嵌套元组,那么您可以 type-cast 元组的值为:
[tuple(s.rsplit(' ', 1)) for s in lst]
这甚至适用于包含多个单词的标签:
def processdata(lst):
raw_tuples = [i.split() for i in lst]
sani_tuples = [(' '.join(i[:-1]), i[-1]) for i in raw_tuples]
return sani_tuples
if __name__ == '__main__':
lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'test multi word tag 101020']
print(processdata(lst))
输出:
[('rock', '101071'), ('pop', '69159'), ('alternative', '55777'), ('test multi word tag', '101020')]
>>> lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175',
... 'electronic 46270', 'female vocalists 42565', 'favorites 39921',
... 'Love 34901', 'dance 33618', '00s 31432']
>>>
>>> [tuple(s.rsplit(maxsplit=1)) for s in lst]
[('rock', '101071'), ('pop', '69159'), ('alternative', '55777'), ('indie', '48175'), ('electronic', '46270'), ('female vocalists', '42565'), ('favorites', '39921'), ('Love', '34901'), ('dance', '33618'), ('00s', '31432')]
我有这个列表,由一串标签和权重组成:
lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175',
'electronic 46270', 'female vocalists 42565', 'favorites 39921',
'Love 34901', 'dance 33618', '00s 31432']
我正在尝试将其转换为元组,例如:
[('rock ', '101071'), ('pop ', '69159'), ('alternative ', '55777'), ('indie ', '48175'),
('electronic ', '46270'), ('female vocalists ', '42565'), ('favorites ', '39921'),
('Love ', '34901'), ('dance ', '33618'), ('s ', '0031432')]
在这里,每个字符串都被拆分为元组,使得每个元素的索引 0 包含除最后一个单词之外的单词,索引 1 处的元素包含字符串的最后一个单词。
为了实现这个,我的代码是:
tags=[]
weights=[]
for i in lst:
tag = ''.join([x for x in i if not x.isdigit()])
tags.append(tag)
weight = ''.join([x for x in i if x.isdigit()])
weights.append(weight)
然后,如果我这样做:
print zip(tags, weights)
我得到了想要的结果。但不幸的是,有些标签本身由数字组成,例如 lst
.
00's
如何正确格式化 ('00s ', '0031432')
?
PS:作为备选的拆分方式,i.split("")
并不理想,因为集合中有些标签词数较多。
您可以使用str.rsplit()
将字符串以space为基础拆分为maxsplit
作为1。例如:
>>> lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175', 'electronic 46270', 'female vocalists 42565', 'favorites 39921', 'Love 34901', 'dance 33618', '00s 31432']
>>> [s.rsplit(' ', 1) for s in lst]
[['rock', '101071'], ['pop', '69159'], ['alternative', '55777'], ['indie', '48175'], ['electronic', '46270'], ['female vocalists', '42565'], ['favorites', '39921'], ['Love', '34901'], ['dance', '33618'], ['00s', '31432']]
但这将是嵌套列表的列表(我认为应该没问题)。但是,如果必须像问题中提到的那样嵌套元组,那么您可以 type-cast 元组的值为:
[tuple(s.rsplit(' ', 1)) for s in lst]
这甚至适用于包含多个单词的标签:
def processdata(lst):
raw_tuples = [i.split() for i in lst]
sani_tuples = [(' '.join(i[:-1]), i[-1]) for i in raw_tuples]
return sani_tuples
if __name__ == '__main__':
lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'test multi word tag 101020']
print(processdata(lst))
输出:
[('rock', '101071'), ('pop', '69159'), ('alternative', '55777'), ('test multi word tag', '101020')]
>>> lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175',
... 'electronic 46270', 'female vocalists 42565', 'favorites 39921',
... 'Love 34901', 'dance 33618', '00s 31432']
>>>
>>> [tuple(s.rsplit(maxsplit=1)) for s in lst]
[('rock', '101071'), ('pop', '69159'), ('alternative', '55777'), ('indie', '48175'), ('electronic', '46270'), ('female vocalists', '42565'), ('favorites', '39921'), ('Love', '34901'), ('dance', '33618'), ('00s', '31432')]