pandas dataframe 如何取消熔化列

pandas dataframe how to unmelt column

我有一个数据框:

df = key1 key2 .. keyn type  val1 val2 .. valn
      k1   k2      kn   p1    1     2      7
      k1   k2      kn   p2    6     1      5
      k1   k2      kn   p3    8     4      1
      k3   k2      kn   p1    4     6      9
      k3   k2      kn   p2    6     1      0
      k3   k2      kn   p3    1     2      8

所以,对于每组键 key..keyn 我在列类型中有 3 个值。 我想将它解散到列中,以便每个 <type, val> 配对都有一个列,所以我最终会得到:

df = key1 key2 .. keyn val1_typep1 val1_typep2 val1_typep3 ..  valn_typep1 valn_typep2 valn_typep3  
      k1   k2      kn       1          6            8               7           5           1 
      k3   k2      kn       4          6            1               9           0           8 

最好的方法是什么?

您可以 filter key 之类的列,然后 pivot 重塑数据框 indexkeyscolumnstype,最后使用 map + .join:

展平多索引列
# filter the key like columns
keys = df.filter(like='key').columns

# pivot the dataframe on keys and type
pvt = df.pivot(index=list(keys), columns='type')

# Flatten the multiindex columns
# by joining around seperator _type
pvt.columns = pvt.columns.map('_type'.join)
pvt = pvt.reset_index()

>>> pvt

  key1 key2 keyn  val1_typep1  val1_typep2  val1_typep3  val2_typep1  val2_typep2  val2_typep3  valn_typep1  valn_typep2  valn_typep3
0   k1   k2   kn            1            6            8            2            1            4            7            5            1

只需使用 pandas 中的 stack and unstack

根据我的理解,使用 unstack 将内部索引的内容转换为列。

如同使用 stack 将列转换为内部索引的内容。

# sample data
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)

# stack the df, with index `['key1', 'key2', 'keyn', 'type']`
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('type')
# print(cols) # ['key1', 'key2', 'keyn', 'type']
dfn = df.set_index(cols).loc[:, 'val1':'valn'].stack().reset_index()
# print(dfn)

# create col_name as `valn_typep1`
dfn['col_name'] = dfn.iloc[:,-2] + '_type'  +  dfn.iloc[:,-3] 

# set index with `['key1', 'key2', 'keyn', 'col_name']`, value is column 0, and unstack, transfer index col_name to columns
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('col_name')
# print(cols) # ['key1', 'key2', 'keyn', 'col_name']
df_result = dfn.set_index(cols)[0].unstack().reset_index()
# print(df_result)

结果:

print(dfn)

       key1 key2 keyn type level_4  0
    0    k1   k2   kn   p1    val1  1
    1    k1   k2   kn   p1    val2  2
    2    k1   k2   kn   p1    valn  7
    3    k1   k2   kn   p2    val1  6
    4    k1   k2   kn   p2    val2  1
    5    k1   k2   kn   p2    valn  5
    6    k1   k2   kn   p3    val1  8
    7    k1   k2   kn   p3    val2  4
    8    k1   k2   kn   p3    valn  1
    9    k3   k2   kn   p1    val1  4
    10   k3   k2   kn   p1    val2  6
    11   k3   k2   kn   p1    valn  9
    12   k3   k2   kn   p2    val1  6
    13   k3   k2   kn   p2    val2  1
    14   k3   k2   kn   p2    valn  0
    15   k3   k2   kn   p3    val1  1
    16   k3   k2   kn   p3    val2  2
    17   k3   k2   kn   p3    valn  8

print(df_result)

    col_name key1 key2 keyn  val1_typep1  val1_typep2  val1_typep3  val2_typep1  \
    0          k1   k2   kn            1            6            8            2   
    1          k3   k2   kn            4            6            1            6   
    
    col_name  val2_typep2  val2_typep3  valn_typep1  valn_typep2  valn_typep3  
    0                   1            4            7            5            1  
    1                   1            2            9            0            8 

对索引字段使用 melt id_vars,对要融化的字段使用 var,然后按索引字段对结果进行排序

data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
print(df)
results=pd.melt(df,value_vars=['val1','val2','valn'],id_vars=['key1','key2','keyn','type'])
print(results.sort_values(by=['key1','key2','keyn','type']))


output:
   key1 key2 keyn type variable  value
0    k1   k2   kn   p1     val1      1
6    k1   k2   kn   p1     val2      2
12   k1   k2   kn   p1     valn      7
1    k1   k2   kn   p2     val1      6
7    k1   k2   kn   p2     val2      1
13   k1   k2   kn   p2     valn      5
2    k1   k2   kn   p3     val1      8
8    k1   k2   kn   p3     val2      4
14   k1   k2   kn   p3     valn      1
3    k3   k2   kn   p1     val1      4
9    k3   k2   kn   p1     val2      6
15   k3   k2   kn   p1     valn      9
4    k3   k2   kn   p2     val1      6
10   k3   k2   kn   p2     val2      1
16   k3   k2   kn   p2     valn      0
5    k3   k2   kn   p3     val1      1
11   k3   k2   kn   p3     val2      2
17   k3   k2   kn   p3     valn      8

#reverse the melt

fp=results.pivot(index=['key1','key2','keyn','type'],columns= 
['variable'],values=['value'])
#fp = fp[(fp.T != 0).any()]
print(fp)

value          
variable             val1 val2 valn
key1 key2 keyn type                
k1   k2   kn   p1       1    2    7
               p2       6    1    5
               p3       8    4    1
k3   k2   kn   p1       4    6    9
               p2       6    1    0
               p3       1    2    8