Pandas 在多列上展开

Pandas Explode on Multiple columns

使用 Pandas 0.25.3,尝试展开几列。

数据看起来像:

d1 = {'user':['user1','user2','user3','user4'],
      'paid':['Y','Y','N','N']
      'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
      'col4':'data'}

我将其发送到数据框 df=pd.DataFrame([d1],columns=d1.keys()),如下所示:

user                              paid              last_active                                                col4               
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018']  'data'

还有其他列,每个列都有一个值,{'A':'B'} 类型的东西,但我不担心这些。

当我执行 df.explode('user') 时,它适用于该列,其他列也一样,但是当我尝试执行 df.explode(column=('user','paid','last_active') 时,它会出现以下错误:

KeyError: ('user','paid','last_active')

所以我想知道的是,如何在多个列上使用 explode 函数对其进行分解以获得以下 df:

user     paid  last_active    col4
'user1'  'Y'   '11 Jul 2019'  'data'
'user2'  'Y'   '23 Sep 2018'  NaN
'user3'  'N'   '08 Dec 2019'  NaN
'user4'  'N'   '03 Mar 2018'  NaN

我猜你需要(注意 col4 的数据差异,如 OP 所述 None

pd.DataFrame([[i] if not isinstance(i,list) else i 
             for i in d1.values()],index=d1.keys()).T

    user paid  last_active  col4
0  user1    Y  11 Jul 2019  data
1  user2    Y  23 Sep 2018  None
2  user3    N  08 Dec 2019  None
3  user4    N  03 Mar 2018  None

Pandas 没有多列展开。有解决方法。一种这样的简单方法可能是:

df = pd.DataFrame(
    {
        'A': [1, 2],
        'B': [['a','b'], ['c','d']],
        'C': [['z','y'], ['x','w']]
    }
)
print(df)

--------------
A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}

other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists

##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format

##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))

##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]

print(df2)

------
A B C
------
1 a z
1 b y
2 c x
2 d w