将每列中的列表转换为其各自的数据类型

Convert lists present in each column to its respective datatypes

我有一个示例数据框,如下所示。

import pandas as pd

data = {'ID':['A', 'B', 'C', 'D],
    'Age':[[20], [21], [19], [24]],
    'Sex':[['Male'], ['Male'],['Female'], np.nan],
    'Interest': [['Dance','Music'], ['Dance','Sports'], ['Hiking','Surfing'], np.nan]}


df = pd.DataFrame(data)
df

每列都是列表数据类型。我想删除这些列表并保留列表中所有列的数据类型。

最终输出应如下所示。

非常感谢任何帮助。谢谢。

选项 1. 您可以使用 .str 列访问器来索引存储在 DataFrame 值(或字符串,或任何其他可迭代对象)中的列表:

# Replace columns containing length-1 lists with the only item in each list
df['Age'] = df['Age'].str[0]
df['Sex'] = df['Sex'].str[0]

# Pass the variable-length list into the join() string method
df['Interest'] = df['Interest'].apply(', '.join)

选项 2. explode 年龄和性别,然后将 ', '.join 应用于兴趣:

df = df.explode(['Age', 'Sex'])
df['Interest'] = df['Interest'].apply(', '.join)

两个选项return:

df

  ID  Age     Sex         Interest
0  A   20    Male     Dance, Music
1  B   21    Male    Dance, Sports
2  C   19  Female  Hiking, Surfing

编辑

选项 3. 如果您有许多列包含可能缺失值如 np.nan 的列表,您可以获取列表列名称,然后循环遍历它们如下:

# Get columns which contain at least one python list

list_cols = [c for c in df 
             if df[c].apply(lambda x: isinstance(x, list)).any()]
list_cols

['Age', 'Sex', 'Interest']


# Process each column

for c in list_cols:
    # If all lists in column c contain a single item:
    if (df[c].str.len() == 1).all():
        df[c] = df[c].str[0]
    else:
        df[c] = df[c].apply(', '.join)