获取列表中所有字符串的第一个单词
Get first word of all strings in lists
我有一个正在阅读的 CSV 文件,如下所示。我需要得到所有字符串的第一个字。我知道如何获得首字母,但我不确定如何获得单词。
['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
我希望我的输出是
diffuse
back
public
forearm
您可以使用列表理解和 split()
函数:
>>> l=['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
>>> [i.split()[0] for i in l]
['diffuse', 'back', 'public']
你可以使用理解
>>> l = [['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
,['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']]
>>> list({i.split()[0] for j in l for i in j})
['back', 'diffuse', 'forearm', 'public']
l = [
['diffuse systemic sclerosis', 'back', 'public on july 15 2008'],
['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
]
d = lambda o: [a.split().pop(0) for a in o]
r = lambda a,b: d(a) + d(b)
print "\n".join(set(reduce(r, l)))
>>>
public
forearm
diffuse
back
可以在列表理解中使用str.split
,注意可以指定maxsplit
来减少操作次数:
L = ['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
res = [i.split(maxsplit=1)[0] for i in L]
# ['diffuse', 'back', 'public']
您也可以在功能上执行相同的操作:
from operator import itemgetter, methodcaller
splitter = methodcaller('split', maxsplit=1)
res = list(map(itemgetter(0), map(splitter, L)))
在多个列表中,如果您希望保持观察第一个单词的顺序,您可以使用 itertool
unique_everseen
recipe, also found in the more_itertools
库:
from itertools import chain
from more_itertool import unique_everseen
L1 = ['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
L2 = ['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
res = list(unique_everseen(i.split(maxsplit=1)[0] for i in chain(L1, L2)))
# ['diffuse', 'back', 'public', 'forearm']
我有一个正在阅读的 CSV 文件,如下所示。我需要得到所有字符串的第一个字。我知道如何获得首字母,但我不确定如何获得单词。
['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
我希望我的输出是
diffuse
back
public
forearm
您可以使用列表理解和 split()
函数:
>>> l=['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
>>> [i.split()[0] for i in l]
['diffuse', 'back', 'public']
你可以使用理解
>>> l = [['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
,['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']]
>>> list({i.split()[0] for j in l for i in j})
['back', 'diffuse', 'forearm', 'public']
l = [
['diffuse systemic sclerosis', 'back', 'public on july 15 2008'],
['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
]
d = lambda o: [a.split().pop(0) for a in o]
r = lambda a,b: d(a) + d(b)
print "\n".join(set(reduce(r, l)))
>>>
public
forearm
diffuse
back
可以在列表理解中使用str.split
,注意可以指定maxsplit
来减少操作次数:
L = ['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
res = [i.split(maxsplit=1)[0] for i in L]
# ['diffuse', 'back', 'public']
您也可以在功能上执行相同的操作:
from operator import itemgetter, methodcaller
splitter = methodcaller('split', maxsplit=1)
res = list(map(itemgetter(0), map(splitter, L)))
在多个列表中,如果您希望保持观察第一个单词的顺序,您可以使用 itertool
unique_everseen
recipe, also found in the more_itertools
库:
from itertools import chain
from more_itertool import unique_everseen
L1 = ['diffuse systemic sclerosis', 'back', 'public on july 15 2008']
L2 = ['diffuse systemic sclerosis', 'forearm', 'public on may 9 2014']
res = list(unique_everseen(i.split(maxsplit=1)[0] for i in chain(L1, L2)))
# ['diffuse', 'back', 'public', 'forearm']