从多级索引数据框中有选择地删除列
Removing columns selectively from multilevel index dataframe
假设我们有一个这样的数据框,并希望在满足某些条件时删除列。
df = pd.DataFrame(
np.arange(2, 14).reshape(-1, 4),
index=list('ABC'),
columns=pd.MultiIndex.from_arrays([
['data1', 'data2','data1','data2'],
['F', 'K','R','X'],
['C', 'D','E','E']
], names=['meter', 'Sleeper','sweeper'])
)
df
然后假设我们只想在 meter == data1
和 sweeper == E
时删除 cols
所以我尝试了
df = df.drop(('data1','E'),axis = 1)
KeyError: 'E'
第二次尝试
df.drop(('data1','E'), axis = 1, level = 2)
KeyError: "labels [('data1', 'E')] not found in level"
Pandas: drop a level from a multi-level column index?
你必须单独完成它们,因为它们处于不同的级别:
df.drop('data1', axis=1, level='meter').drop('E', axis = 1, level='sweeper')
Out[833]:
meter data2
Sleeper K
sweeper D
A 3
B 7
C 11
似乎 drop
doesn't support selection over split levels ([0,2]
here). We can create a mask with the conditions instead using get_level_values
:
# keep where not ((level0 is 'data1') and (level2 is 'E'))
col_mask = ~((df.columns.get_level_values(0) == 'data1')
& (df.columns.get_level_values(2) == 'E'))
df = df.loc[:, col_mask]
我们也可以通过排除特定索引切片中的位置来按整数位置执行此操作,但是,这总体上不太清晰且不太灵活:
idx = pd.IndexSlice['data1', :, 'E']
cols = [i for i in range(len(df.columns))
if i not in df.columns.get_locs(idx)]
df = df.iloc[:, cols]
任何一种方法都会产生 df
:
meter data1 data2
Sleeper F K X
sweeper C D E
A 2 3 5
B 6 7 9
C 10 11 13
假设我们有一个这样的数据框,并希望在满足某些条件时删除列。
df = pd.DataFrame(
np.arange(2, 14).reshape(-1, 4),
index=list('ABC'),
columns=pd.MultiIndex.from_arrays([
['data1', 'data2','data1','data2'],
['F', 'K','R','X'],
['C', 'D','E','E']
], names=['meter', 'Sleeper','sweeper'])
)
df
然后假设我们只想在 meter == data1
和 sweeper == E
时删除 cols
所以我尝试了
df = df.drop(('data1','E'),axis = 1)
KeyError: 'E'
第二次尝试
df.drop(('data1','E'), axis = 1, level = 2)
KeyError: "labels [('data1', 'E')] not found in level"
Pandas: drop a level from a multi-level column index?
你必须单独完成它们,因为它们处于不同的级别:
df.drop('data1', axis=1, level='meter').drop('E', axis = 1, level='sweeper')
Out[833]:
meter data2
Sleeper K
sweeper D
A 3
B 7
C 11
似乎 drop
doesn't support selection over split levels ([0,2]
here). We can create a mask with the conditions instead using get_level_values
:
# keep where not ((level0 is 'data1') and (level2 is 'E'))
col_mask = ~((df.columns.get_level_values(0) == 'data1')
& (df.columns.get_level_values(2) == 'E'))
df = df.loc[:, col_mask]
我们也可以通过排除特定索引切片中的位置来按整数位置执行此操作,但是,这总体上不太清晰且不太灵活:
idx = pd.IndexSlice['data1', :, 'E']
cols = [i for i in range(len(df.columns))
if i not in df.columns.get_locs(idx)]
df = df.iloc[:, cols]
任何一种方法都会产生 df
:
meter data1 data2
Sleeper F K X
sweeper C D E
A 2 3 5
B 6 7 9
C 10 11 13