pandas 根据一个 header 值删除一列
pandas drop a column according to one header value
我有这个数据框
name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451
我读作:
dfr = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
header = [0,1,2,3],
infer_datetime_format = True,
cache_dates=True)
我想根据第 4 行的阈值删除一些列,因为我使用了多个索引,所以它是其中的一个。
我想做这样的事情:
for column in dfr:
if dfr[column][2] <= 1300.:
dfr = dfr.drop(column,axis=1)
问题是我无法 select 多头中的正确“头”。我也想以一种聪明的方式来做,换句话说,避免循环。
你可以select第四级 Index.get_level_values
and select columns with invert mask - greater like 1300
in DataFrame.loc
:
df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]
或者,如果不需要总是转换为整数,则可以在求解前设置值:
df = df.rename(columns=int, level=3)
print (df.columns)
MultiIndex([('01100MS', '626323', '5188431', 915),
('02200MS', '616720', '5181393', 1499),
('02500MS', '616288', '5173583', 1310),
('03100MS', '611860', '5165895', 1235),
('06400MS', '622375', '5152605', 190)],
names=['name', 'lat', 'long', 'alt'])
df = df.loc[:,df.columns.get_level_values(3) > 1300]
print (df)
name 02200MS 02500MS
lat 616720 616288
long 5181393 5173583
alt 1499 1310
1920-01-01 4.1 4.41
1920-01-02 4.2 4.42
1920-01-03 4.3 4.43
1920-01-04 4.4 4.44
1920-01-05 4.5 4.45
1920-01-06 4.6 4.46
1920-01-07 4.7 4.47
1920-01-08 4.8 4.48
1920-01-09 4.9 4.49
1920-01-10 5.0 4.50
1920-01-11 5.1 4.51
我有这个数据框
name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451
我读作:
dfr = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
header = [0,1,2,3],
infer_datetime_format = True,
cache_dates=True)
我想根据第 4 行的阈值删除一些列,因为我使用了多个索引,所以它是其中的一个。
我想做这样的事情:
for column in dfr:
if dfr[column][2] <= 1300.:
dfr = dfr.drop(column,axis=1)
问题是我无法 select 多头中的正确“头”。我也想以一种聪明的方式来做,换句话说,避免循环。
你可以select第四级 Index.get_level_values
and select columns with invert mask - greater like 1300
in DataFrame.loc
:
df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]
或者,如果不需要总是转换为整数,则可以在求解前设置值:
df = df.rename(columns=int, level=3)
print (df.columns)
MultiIndex([('01100MS', '626323', '5188431', 915),
('02200MS', '616720', '5181393', 1499),
('02500MS', '616288', '5173583', 1310),
('03100MS', '611860', '5165895', 1235),
('06400MS', '622375', '5152605', 190)],
names=['name', 'lat', 'long', 'alt'])
df = df.loc[:,df.columns.get_level_values(3) > 1300]
print (df)
name 02200MS 02500MS
lat 616720 616288
long 5181393 5173583
alt 1499 1310
1920-01-01 4.1 4.41
1920-01-02 4.2 4.42
1920-01-03 4.3 4.43
1920-01-04 4.4 4.44
1920-01-05 4.5 4.45
1920-01-06 4.6 4.46
1920-01-07 4.7 4.47
1920-01-08 4.8 4.48
1920-01-09 4.9 4.49
1920-01-10 5.0 4.50
1920-01-11 5.1 4.51