将熊猫数据框中的列表转换为列

Question

city        state   neighborhoods       categories
Dravosburg  PA      [asas,dfd]          ['Nightlife']
Dravosburg  PA      [adad]              ['Auto_Repair','Automotive']

我有上面的数据框我想将列表的每个元素转换为列，例如：

city        state asas dfd adad Nightlife Auto_Repair Automotive 
Dravosburg  PA    1     1   0   1         1           0

我正在使用以下代码来执行此操作：

def list2columns(df):
"""
to convert list in the columns 
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:    
    for i in range(len(df)):
        for element in eval(df.loc[i,"categories"]):
            if len(element)!=0:
                if element not in df.columns:
                    df.loc[:,element]=0
                else:
                    df.loc[i,element]=1

如何更有效地做到这一点？

为什么我已经在使用df.loc

，为什么还有下面的警告

SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead

Answer 1

改用这个

def list2columns(df):
    """
    to convert list in the columns 
    of a dataframe
    """
    df = df.copy()
    columns=['categories','neighborhoods']
    for col in columns:    
        for i in range(len(df)):
            for element in eval(df.loc[i,"categories"]):
                if len(element)!=0:
                    if element not in df.columns:
                        df.loc[:,element]=0
                    else:
                        df.loc[i,element]=1
    return df

Answer 2

由于您使用的是 eval()，我假设每一列都有一个列表的字符串表示，而不是列表本身。此外，与您上面的示例不同，我假设您的 neighborhoods 列 (df.iloc[0, 'neighborhoods'] == "['asas','dfd']") 中的列表中的项目周围有引号，否则您的 eval() 将失败。

如果一切正确，您可以尝试这样的操作：

def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set()      # list of all new columns added
for col in columns:    
    for i in range(len(df[col])):
        # get the list of columns to set
        set_cols = eval(df.iloc[i, col])
        # set the values of these columns to 1 in the current row
        # (if this causes new columns to be added, other rows will get nans)
        df.iloc[i, set_cols] = 1
        # remember which new columns have been added
        new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))

关于 SettingWithCopy 警告，我只能推测你的第二个问题的答案。

使用 df.iloc 而不是 df.loc 可能（但不太可能）会有所帮助，因为这是为了 select 按行号（在您的情况下， df.loc[i, col] 之所以有效，是因为您尚未设置索引，因此 pandas 使用与行号匹配的默认索引）。

另一种可能性是传递给您的函数的 df 已经是较大数据帧的切片，这会导致 SettingWithCopy 警告。

我还发现在混合索引模式下使用 df.loc（行的逻辑 select 或列的列名）会产生 SettingWithCopy 警告；您的切片 select 可能会导致类似的问题。

希望上面代码中更简单、更直接的索引能够解决所有这些问题。但是，如果您仍然看到该警告，请反馈（并提供生成 df 的代码）。

将熊猫数据框中的列表转换为列

Converting list in panda dataframe into columns

python

list

data-analysis

dataframe

pandas