如何在 Pandas 中扩展多级列时删除 nan
How to remove nan when extend multilevel column in Pandas
我想扩展以下多级
first tech_one ... tech_four etc mtc
second ch0_b0 ch1_b0 ch2_b0 ... ch5
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
进入
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
草拟了以下代码。
import pandas as pd
import numpy as np
import re
np.random.seed(0)
arrays = [["tech_one", "tech_one", "tech_one", "tech_one", "tech_two", "tech_two", "tech_two",
"tech_two",'tech_three','tech_three','tech_four','etc','mtc'],
["ch0_b0", "ch1_b0", "ch2_b0", "ch3_b0", "ch0", "ch1", "ch2", "ch3","ch1",'ch3','ch5','','']]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, len(arrays[0])), columns=index)
tup=[(e[0],*re.split('_',e[1])) for e in df.columns]
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
产生了以下结果
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ... NaN NaN NaN
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
如上所示,NaN
尽管已使用行
删除,但仍然存在
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
我可以知道如何处理这个问题吗?
正如我在评论中提到的,NaN
的原因是元组长度不同,您可以只创建长度相等的元组列表,从列表中获取最大长度元组的长度,然后有空字符串 ''
来扩大元组。
n = len(max(remove_nan, key=len))
remove_nan=[t+('',)*(n-len(t)) for t in remove_nan]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
输出:
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 -0.969233 0.746873 0.253076 ... 0.087689 0.874305 0.380449
1 0.387685 -0.382714 -1.043338 ... -1.684973 1.346454 -0.437792
2 -1.300301 0.164648 -0.032736 ... 1.198207 1.608662 -0.818090
[3 rows x 13 columns]
我想扩展以下多级
first tech_one ... tech_four etc mtc
second ch0_b0 ch1_b0 ch2_b0 ... ch5
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
进入
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
草拟了以下代码。
import pandas as pd
import numpy as np
import re
np.random.seed(0)
arrays = [["tech_one", "tech_one", "tech_one", "tech_one", "tech_two", "tech_two", "tech_two",
"tech_two",'tech_three','tech_three','tech_four','etc','mtc'],
["ch0_b0", "ch1_b0", "ch2_b0", "ch3_b0", "ch0", "ch1", "ch2", "ch3","ch1",'ch3','ch5','','']]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, len(arrays[0])), columns=index)
tup=[(e[0],*re.split('_',e[1])) for e in df.columns]
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
产生了以下结果
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ... NaN NaN NaN
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
如上所示,NaN
尽管已使用行
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
我可以知道如何处理这个问题吗?
正如我在评论中提到的,NaN
的原因是元组长度不同,您可以只创建长度相等的元组列表,从列表中获取最大长度元组的长度,然后有空字符串 ''
来扩大元组。
n = len(max(remove_nan, key=len))
remove_nan=[t+('',)*(n-len(t)) for t in remove_nan]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
输出:
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 -0.969233 0.746873 0.253076 ... 0.087689 0.874305 0.380449
1 0.387685 -0.382714 -1.043338 ... -1.684973 1.346454 -0.437792
2 -1.300301 0.164648 -0.032736 ... 1.198207 1.608662 -0.818090
[3 rows x 13 columns]