将数据框的一列拆分为两个单独的列

Question

我想将数据框的一列拆分为两个单独的列。这是我的数据框的样子（只有前 3 行）：

我想将 referenced_tweets 列分成两列：type 和 id，例如，对于第一行，type 列将是 replied_to 而 id 的值将是 1253050942716551168.

这是我试过的方法：

df[['type', 'id']] = df['referenced_tweets'].str.split(',', n=1, expand=True)

但我收到错误：

ValueError: Columns must be the same length as key

（我想我得到这个错误是因为 referenced_tweets 列中的类型并不总是 replied_to（例如，它可以是 retweeted，因此，长度将是不同）

Answer 1

为什么不从字典中获取值并将其添加两个新列？

def unpack_column(df_series, key):
    """ Function that unpacks the key value of your column and skips NaN values """
    return [None if pd.isna(value) else value[0][key] for value in df_series]
    
    
df['type'] = unpack_column(df['referenced_tweets'], 'type')
df['id'] = unpack_column(df['referenced_tweets'], 'id')

或单行：

df[['type', 'id']] = df['referenced_tweets'].apply(lambda x: (x[0]['type'], x[0]['id']))

将数据框的一列拆分为两个单独的列

Split a column of a dataframe into two separate columns

python

split

strsplit

pandas