如何在 Pandas 中扩展多级列时删除 nan

How to remove nan when extend multilevel column in Pandas

我想扩展以下多级

first   tech_one                      ... tech_four       etc       mtc
second    ch0_b0    ch1_b0    ch2_b0  ...       ch5                    
0       1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1       0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2       0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

进入

   tech_one                      ... tech_four       etc       mtc
        ch0       ch1       ch2  ...       ch5         _         _
         b0        b0        b0  ...                  
0  1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1  0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2  0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

草拟了以下代码。

import pandas as pd
import numpy as np
import re
np.random.seed(0)


arrays = [["tech_one", "tech_one", "tech_one", "tech_one", "tech_two", "tech_two", "tech_two",
           "tech_two",'tech_three','tech_three','tech_four','etc','mtc'],
          ["ch0_b0", "ch1_b0", "ch2_b0", "ch3_b0", "ch0", "ch1", "ch2", "ch3","ch1",'ch3','ch5','','']]

index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, len(arrays[0])), columns=index)
tup=[(e[0],*re.split('_',e[1])) for e in df.columns]
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]

df.columns= pd.MultiIndex.from_tuples(remove_nan)

产生了以下结果

   tech_one                      ... tech_four       etc       mtc
        ch0       ch1       ch2  ...       ch5         _         _
         b0        b0        b0  ...       NaN       NaN       NaN
0  1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1  0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2  0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

如上所示,NaN 尽管已使用行

删除,但仍然存在
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]

我可以知道如何处理这个问题吗?

正如我在评论中提到的,NaN 的原因是元组长度不同,您可以只创建长度相等的元组列表,从列表中获取最大长度元组的长度,然后有空字符串 '' 来扩大元组。

n = len(max(remove_nan, key=len))
remove_nan=[t+('',)*(n-len(t)) for t in remove_nan]
df.columns= pd.MultiIndex.from_tuples(remove_nan)

输出:

   tech_one                      ... tech_four       etc       mtc
        ch0       ch1       ch2  ...       ch5         _         _
         b0        b0        b0  ...                              
0 -0.969233  0.746873  0.253076  ...  0.087689  0.874305  0.380449
1  0.387685 -0.382714 -1.043338  ... -1.684973  1.346454 -0.437792
2 -1.300301  0.164648 -0.032736  ...  1.198207  1.608662 -0.818090
[3 rows x 13 columns]