基于日期时间索引屏蔽数据框列
Mask dataframe column based on datetime index
与 this question 非常相似,只是我需要同时考虑日期和时间; indexer_between_time
似乎不支持我能找到的任何日期时间格式。
我有一个看起来像这样的 dask 数据框:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 37.150902 -98.362
2017-01-01 00:02:00 12.0057 37.150902 -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 -98.362
2017-01-01 00:03:00 12.0113 37.150902 -98.362
以及在特定时间范围内要屏蔽的列列表(这些范围内的数据被认为是 "bad" 并且应该 return None
代替)在表单或列表中python 个元组:
[ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
期望的结果:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 None -98.362
2017-01-01 00:02:00 12.0057 None -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 None
2017-01-01 00:03:00 12.0113 37.150902 None
无效代码:
dqrs = [ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
df = xarray.open_dataset('filename.cdf').to_dask_dataframe()
dqr_mask = (df == df) | df.isnull() # create a dummy mask that's all True
for var, start, end in dqrs:
dqr_mask |= ((df.columns == var) & (df.index >= start) & (df.index >= end))
df = df.mask(dqr_mask).compute()
其他方法的问题:
- Dask 数据帧尚未实现切片分配,因此
df[start:end] = None
之类的东西不适用于此
您只需要 select 循环 for
中 dqr_mask
的列 var
要修改。这是一种方法:
dqr_mask = df != df # you want a mask set to False where there is a value
for var, start, end in dqrs:
#set to True the column var when index is between start and end
dqr_mask[var] |= (df.index >= start) & (df.index <= end)
# where dqr_mask False it keeps df otherwise it set the value to None
df = df.mask(dqr_mask,other=None)
print (df)
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.1509 -98.362
2017-01-01 00:01:40 12.0113 None -98.362
2017-01-01 00:02:00 12.0057 None -98.362
2017-01-01 00:02:20 12.0113 37.1509 -98.362
2017-01-01 00:02:40 12.0058 37.1509 None
2017-01-01 00:03:00 12.0113 37.1509 None
与 this question 非常相似,只是我需要同时考虑日期和时间; indexer_between_time
似乎不支持我能找到的任何日期时间格式。
我有一个看起来像这样的 dask 数据框:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 37.150902 -98.362
2017-01-01 00:02:00 12.0057 37.150902 -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 -98.362
2017-01-01 00:03:00 12.0113 37.150902 -98.362
以及在特定时间范围内要屏蔽的列列表(这些范围内的数据被认为是 "bad" 并且应该 return None
代替)在表单或列表中python 个元组:
[ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
期望的结果:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 None -98.362
2017-01-01 00:02:00 12.0057 None -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 None
2017-01-01 00:03:00 12.0113 37.150902 None
无效代码:
dqrs = [ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
df = xarray.open_dataset('filename.cdf').to_dask_dataframe()
dqr_mask = (df == df) | df.isnull() # create a dummy mask that's all True
for var, start, end in dqrs:
dqr_mask |= ((df.columns == var) & (df.index >= start) & (df.index >= end))
df = df.mask(dqr_mask).compute()
其他方法的问题:
- Dask 数据帧尚未实现切片分配,因此
df[start:end] = None
之类的东西不适用于此
您只需要 select 循环 for
中 dqr_mask
的列 var
要修改。这是一种方法:
dqr_mask = df != df # you want a mask set to False where there is a value
for var, start, end in dqrs:
#set to True the column var when index is between start and end
dqr_mask[var] |= (df.index >= start) & (df.index <= end)
# where dqr_mask False it keeps df otherwise it set the value to None
df = df.mask(dqr_mask,other=None)
print (df)
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.1509 -98.362
2017-01-01 00:01:40 12.0113 None -98.362
2017-01-01 00:02:00 12.0057 None -98.362
2017-01-01 00:02:20 12.0113 37.1509 -98.362
2017-01-01 00:02:40 12.0058 37.1509 None
2017-01-01 00:03:00 12.0113 37.1509 None