将每列中的列表转换为其各自的数据类型
Convert lists present in each column to its respective datatypes
我有一个示例数据框,如下所示。
import pandas as pd
data = {'ID':['A', 'B', 'C', 'D],
'Age':[[20], [21], [19], [24]],
'Sex':[['Male'], ['Male'],['Female'], np.nan],
'Interest': [['Dance','Music'], ['Dance','Sports'], ['Hiking','Surfing'], np.nan]}
df = pd.DataFrame(data)
df
每列都是列表数据类型。我想删除这些列表并保留列表中所有列的数据类型。
最终输出应如下所示。
非常感谢任何帮助。谢谢。
选项 1. 您可以使用 .str
列访问器来索引存储在 DataFrame 值(或字符串,或任何其他可迭代对象)中的列表:
# Replace columns containing length-1 lists with the only item in each list
df['Age'] = df['Age'].str[0]
df['Sex'] = df['Sex'].str[0]
# Pass the variable-length list into the join() string method
df['Interest'] = df['Interest'].apply(', '.join)
选项 2. explode
年龄和性别,然后将 ', '.join
应用于兴趣:
df = df.explode(['Age', 'Sex'])
df['Interest'] = df['Interest'].apply(', '.join)
两个选项return:
df
ID Age Sex Interest
0 A 20 Male Dance, Music
1 B 21 Male Dance, Sports
2 C 19 Female Hiking, Surfing
编辑
选项 3. 如果您有许多列包含可能缺失值如 np.nan
的列表,您可以获取列表列名称,然后循环遍历它们如下:
# Get columns which contain at least one python list
list_cols = [c for c in df
if df[c].apply(lambda x: isinstance(x, list)).any()]
list_cols
['Age', 'Sex', 'Interest']
# Process each column
for c in list_cols:
# If all lists in column c contain a single item:
if (df[c].str.len() == 1).all():
df[c] = df[c].str[0]
else:
df[c] = df[c].apply(', '.join)
我有一个示例数据框,如下所示。
import pandas as pd
data = {'ID':['A', 'B', 'C', 'D],
'Age':[[20], [21], [19], [24]],
'Sex':[['Male'], ['Male'],['Female'], np.nan],
'Interest': [['Dance','Music'], ['Dance','Sports'], ['Hiking','Surfing'], np.nan]}
df = pd.DataFrame(data)
df
最终输出应如下所示。
非常感谢任何帮助。谢谢。
选项 1. 您可以使用 .str
列访问器来索引存储在 DataFrame 值(或字符串,或任何其他可迭代对象)中的列表:
# Replace columns containing length-1 lists with the only item in each list
df['Age'] = df['Age'].str[0]
df['Sex'] = df['Sex'].str[0]
# Pass the variable-length list into the join() string method
df['Interest'] = df['Interest'].apply(', '.join)
选项 2. explode
年龄和性别,然后将 ', '.join
应用于兴趣:
df = df.explode(['Age', 'Sex'])
df['Interest'] = df['Interest'].apply(', '.join)
两个选项return:
df
ID Age Sex Interest
0 A 20 Male Dance, Music
1 B 21 Male Dance, Sports
2 C 19 Female Hiking, Surfing
编辑
选项 3. 如果您有许多列包含可能缺失值如 np.nan
的列表,您可以获取列表列名称,然后循环遍历它们如下:
# Get columns which contain at least one python list
list_cols = [c for c in df
if df[c].apply(lambda x: isinstance(x, list)).any()]
list_cols
['Age', 'Sex', 'Interest']
# Process each column
for c in list_cols:
# If all lists in column c contain a single item:
if (df[c].str.len() == 1).all():
df[c] = df[c].str[0]
else:
df[c] = df[c].apply(', '.join)