将一列 CSV 列表缩减为单个列表
reducing a column of CSV lists to a single list
我正在使用 Python3 从 Excel 电子表格中读取一列:
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel('MWE.xlsx', sheet_name='Sheet1')
print(df)
col1 col2
0 starts normal egg, bacon
1 still none the wiser egg, sausage, bacon
2 maybe odd tastes egg, spam
3 or maybe post-war egg, bacon, spam
4 maybe for the hungry egg, bacon, sausage, spam
5 bingo spam, bacon, sausage, spam
我想将 col2
缩减为 col2 中单词的单个列表(例如鸡蛋、培根...)。
df.col2.ravel()
似乎将 col2
缩减为字符串列表。
df.col2.flatten()
产量
AttributeError: 'Series' object has no attribute 'flatten'
尝试一些简单的事情,例如:
df = pd.DataFrame({'col2': [list('abc'), list('de'), list('fghi')]})
flat_col2 = [element for row in df.col2 for element in row]
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
如果你想要的是将一系列列表作为 col2,这就可以了:
df = pd.DataFrame({'col1': ['starts normal','still none the wiser'], 'col2': ['egg, bacon','egg, sausage, bacon']})
df['col2'] = df['col2'].map(lambda x: [i.strip() for i in x.split(',')])
print(df)
结果:
col1 col2
0 starts normal [egg, bacon]
1 still none the wiser [egg, sausage, bacon]
也许这就是您所需要的:
将一系列逗号分隔的字符串转换为列表列表
arrs = df.col2.map(lambda x: [i.strip() for i in x.split(',')]).tolist()
# [['egg', 'bacon'], ['egg', 'sausage', 'bacon'], ...]
获取包含独特项目的列表
unique = list({elem for arr in arrs for elem in arr})
# ['spam', 'sausage', 'egg', 'bacon']
我正在使用 Python3 从 Excel 电子表格中读取一列:
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel('MWE.xlsx', sheet_name='Sheet1')
print(df)
col1 col2
0 starts normal egg, bacon
1 still none the wiser egg, sausage, bacon
2 maybe odd tastes egg, spam
3 or maybe post-war egg, bacon, spam
4 maybe for the hungry egg, bacon, sausage, spam
5 bingo spam, bacon, sausage, spam
我想将 col2
缩减为 col2 中单词的单个列表(例如鸡蛋、培根...)。
df.col2.ravel()
似乎将 col2
缩减为字符串列表。
df.col2.flatten()
产量
AttributeError: 'Series' object has no attribute 'flatten'
尝试一些简单的事情,例如:
df = pd.DataFrame({'col2': [list('abc'), list('de'), list('fghi')]})
flat_col2 = [element for row in df.col2 for element in row]
# ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
如果你想要的是将一系列列表作为 col2,这就可以了:
df = pd.DataFrame({'col1': ['starts normal','still none the wiser'], 'col2': ['egg, bacon','egg, sausage, bacon']})
df['col2'] = df['col2'].map(lambda x: [i.strip() for i in x.split(',')])
print(df)
结果:
col1 col2
0 starts normal [egg, bacon]
1 still none the wiser [egg, sausage, bacon]
也许这就是您所需要的:
将一系列逗号分隔的字符串转换为列表列表
arrs = df.col2.map(lambda x: [i.strip() for i in x.split(',')]).tolist() # [['egg', 'bacon'], ['egg', 'sausage', 'bacon'], ...]
获取包含独特项目的列表
unique = list({elem for arr in arrs for elem in arr}) # ['spam', 'sausage', 'egg', 'bacon']