操纵数据帧从最近的变化午夜时间戳开始
Manipulate the Dataframe to start from the nearest varying Midnight timestamp
我的目标:
我有一个每天在随机时间生成的数据集,导致第一行在随机时间开始。我想让这个数据集从最近的午夜日期开始。例如,如果第一行的日期是 2022-05-09 15:00:00
,我必须对数据进行切片以使其从最近的午夜开始,在这种情况下:2022-05-10 00:00:00
这是 dataset 的样子:
我试过的:
我想找到我想要的时间戳第一次出现的索引,然后应用 iloc
来创建想要的数据集。
match_timestamp = "00:00:00"
[df[df.index.strftime("%H:%M:%S") == match_timestamp].first_valid_index()]
results: [Timestamp('2022-05-10 00:00:00')]
但是,这只会导致提取首次出现的时间戳,我无法将 iloc
应用于行值。截至目前,我被困住了,想不出更优雅的解决方案,我确信它存在。
如果您能推荐一个更好的方法,我将不胜感激。
提前致谢!
下面是提取df的完整代码:
pip install ccxt
import pandas as pd
import ccxt
exchange = ccxt.okx({'options': {'defaultType': 'futures', 'enableRateLimit': True}})
markets = exchange.load_markets()
url = 'https://www.okex.com'
tickers = pd.DataFrame((requests.get(url+'/api/v5/market/tickers?instType=FUTURES').json())['data'])
tickers = tickers.drop('instType', axis=1)
futures_tickers = list(tickers['instId'])
symbol = 'LINK-USD-220930'
candlestick_chart= exchange.fetch_ohlcv(symbol, '1h', limit=500)
candlestick_df = pd.DataFrame(candlestick_chart)
candlestick_df.columns = ['date', 'open', 'high', 'low', 'close', 'volume']
candlestick_df['date'] = pd.to_datetime(candlestick_df['date'], unit='ms')
candlestick_df['date'] = candlestick_df['date'] + pd.Timedelta(hours=8)
df = candlestick_df
df
字典格式:(建议)
{'open': {Timestamp('2022-05-09 15:00:00'): 9.742, Timestamp('2022-05-09 16:00:00'): 9.731, Timestamp('2022-05-09 17:00:00'): 9.743, Timestamp('2022-05-09 18:00:00'): 9.684, Timestamp('2022-05-09 19:00:00'): 9.206, Timestamp('2022-05-09 20:00:00'): 9.43, Timestamp('2022-05-09 21:00:00'): 9.316, Timestamp('2022-05-09 22:00:00'): 9.403, Timestamp('2022-05-09 23:00:00'): 9.215, Timestamp('2022-05-10 00:00:00'): 9.141}, 'high': {Timestamp('2022-05-09 15:00:00'): 9.835, Timestamp('2022-05-09 16:00:00'): 9.75, Timestamp('2022-05-09 17:00:00'): 9.788, Timestamp('2022-05-09 18:00:00'): 9.697, Timestamp('2022-05-09 19:00:00'): 9.465, Timestamp('2022-05-09 20:00:00'): 9.469, Timestamp('2022-05-09 21:00:00'): 9.515, Timestamp('2022-05-09 22:00:00'): 9.413, Timestamp('2022-05-09 23:00:00'): 9.308, Timestamp('2022-05-10 00:00:00'): 9.223}, 'low': {Timestamp('2022-05-09 15:00:00'): 9.699, Timestamp('2022-05-09 16:00:00'): 9.596, Timestamp('2022-05-09 17:00:00'): 9.674, Timestamp('2022-05-09 18:00:00'): 8.739, Timestamp('2022-05-09 19:00:00'): 9.11, Timestamp('2022-05-09 20:00:00'): 9.3, Timestamp('2022-05-09 21:00:00'): 9.208, Timestamp('2022-05-09 22:00:00'): 9.174, Timestamp('2022-05-09 23:00:00'): 9.035, Timestamp('2022-05-10 00:00:00'): 8.724}, 'close': {Timestamp('2022-05-09 15:00:00'): 9.725, Timestamp('2022-05-09 16:00:00'): 9.745, Timestamp('2022-05-09 17:00:00'): 9.682, Timestamp('2022-05-09 18:00:00'): 9.18, Timestamp('2022-05-09 19:00:00'): 9.426, Timestamp('2022-05-09 20:00:00'): 9.32, Timestamp('2022-05-09 21:00:00'): 9.397, Timestamp('2022-05-09 22:00:00'): 9.229, Timestamp('2022-05-09 23:00:00'): 9.152, Timestamp('2022-05-10 00:00:00'): 8.82}, 'volume': {Timestamp('2022-05-09 15:00:00'): 3663.0, Timestamp('2022-05-09 16:00:00'): 6603.0, Timestamp('2022-05-09 17:00:00'): 2855.0, Timestamp('2022-05-09 18:00:00'): 20084.0, Timestamp('2022-05-09 19:00:00'): 8972.0, Timestamp('2022-05-09 20:00:00'): 5551.0, Timestamp('2022-05-09 21:00:00'): 8218.0, Timestamp('2022-05-09 22:00:00'): 7651.0, Timestamp('2022-05-09 23:00:00'): 6935.0, Timestamp('2022-05-10 00:00:00'): 10409.0}}
我对 pandaNewstarter 的简约方法,您只需将它应用到您的 candlestick_df:
import pandas as pd
import datetime
df = pd.read_csv("data.csv")
df.dtypes
# convert date column to dtype timestamp
df.date = pd.to_datetime(df.date)
# get min value from date colum
min_date = df.date.min()
# from min get next day midnight timestamp value
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
NextDay_Date
# create new DateFrame by slicing original
df2 = df[df.date >= NextDay_Date].copy()
输出:
print(NextDay_Date)
2022-05-10 00:00:00
print(df2)
date open high low close volume
9 2022-05-10 00:00:00 9.141 9.223 8.724 8.820 10409.0
10 2022-05-10 01:00:00 8.755 8.979 8.558 8.832 11522.0
11 2022-05-10 02:00:00 8.815 8.880 8.304 8.593 20969.0
12 2022-05-10 03:00:00 8.618 8.720 8.370 8.610 15794.0
13 2022-05-10 04:00:00 8.610 8.929 8.610 8.736 9410.0
.. ... ... ... ... ... ...
我的目标:
我有一个每天在随机时间生成的数据集,导致第一行在随机时间开始。我想让这个数据集从最近的午夜日期开始。例如,如果第一行的日期是 2022-05-09 15:00:00
,我必须对数据进行切片以使其从最近的午夜开始,在这种情况下:2022-05-10 00:00:00
这是 dataset 的样子:
我试过的:
我想找到我想要的时间戳第一次出现的索引,然后应用 iloc
来创建想要的数据集。
match_timestamp = "00:00:00"
[df[df.index.strftime("%H:%M:%S") == match_timestamp].first_valid_index()]
results: [Timestamp('2022-05-10 00:00:00')]
但是,这只会导致提取首次出现的时间戳,我无法将 iloc
应用于行值。截至目前,我被困住了,想不出更优雅的解决方案,我确信它存在。
如果您能推荐一个更好的方法,我将不胜感激。 提前致谢!
下面是提取df的完整代码:
pip install ccxt
import pandas as pd
import ccxt
exchange = ccxt.okx({'options': {'defaultType': 'futures', 'enableRateLimit': True}})
markets = exchange.load_markets()
url = 'https://www.okex.com'
tickers = pd.DataFrame((requests.get(url+'/api/v5/market/tickers?instType=FUTURES').json())['data'])
tickers = tickers.drop('instType', axis=1)
futures_tickers = list(tickers['instId'])
symbol = 'LINK-USD-220930'
candlestick_chart= exchange.fetch_ohlcv(symbol, '1h', limit=500)
candlestick_df = pd.DataFrame(candlestick_chart)
candlestick_df.columns = ['date', 'open', 'high', 'low', 'close', 'volume']
candlestick_df['date'] = pd.to_datetime(candlestick_df['date'], unit='ms')
candlestick_df['date'] = candlestick_df['date'] + pd.Timedelta(hours=8)
df = candlestick_df
df
字典格式:(建议)
{'open': {Timestamp('2022-05-09 15:00:00'): 9.742, Timestamp('2022-05-09 16:00:00'): 9.731, Timestamp('2022-05-09 17:00:00'): 9.743, Timestamp('2022-05-09 18:00:00'): 9.684, Timestamp('2022-05-09 19:00:00'): 9.206, Timestamp('2022-05-09 20:00:00'): 9.43, Timestamp('2022-05-09 21:00:00'): 9.316, Timestamp('2022-05-09 22:00:00'): 9.403, Timestamp('2022-05-09 23:00:00'): 9.215, Timestamp('2022-05-10 00:00:00'): 9.141}, 'high': {Timestamp('2022-05-09 15:00:00'): 9.835, Timestamp('2022-05-09 16:00:00'): 9.75, Timestamp('2022-05-09 17:00:00'): 9.788, Timestamp('2022-05-09 18:00:00'): 9.697, Timestamp('2022-05-09 19:00:00'): 9.465, Timestamp('2022-05-09 20:00:00'): 9.469, Timestamp('2022-05-09 21:00:00'): 9.515, Timestamp('2022-05-09 22:00:00'): 9.413, Timestamp('2022-05-09 23:00:00'): 9.308, Timestamp('2022-05-10 00:00:00'): 9.223}, 'low': {Timestamp('2022-05-09 15:00:00'): 9.699, Timestamp('2022-05-09 16:00:00'): 9.596, Timestamp('2022-05-09 17:00:00'): 9.674, Timestamp('2022-05-09 18:00:00'): 8.739, Timestamp('2022-05-09 19:00:00'): 9.11, Timestamp('2022-05-09 20:00:00'): 9.3, Timestamp('2022-05-09 21:00:00'): 9.208, Timestamp('2022-05-09 22:00:00'): 9.174, Timestamp('2022-05-09 23:00:00'): 9.035, Timestamp('2022-05-10 00:00:00'): 8.724}, 'close': {Timestamp('2022-05-09 15:00:00'): 9.725, Timestamp('2022-05-09 16:00:00'): 9.745, Timestamp('2022-05-09 17:00:00'): 9.682, Timestamp('2022-05-09 18:00:00'): 9.18, Timestamp('2022-05-09 19:00:00'): 9.426, Timestamp('2022-05-09 20:00:00'): 9.32, Timestamp('2022-05-09 21:00:00'): 9.397, Timestamp('2022-05-09 22:00:00'): 9.229, Timestamp('2022-05-09 23:00:00'): 9.152, Timestamp('2022-05-10 00:00:00'): 8.82}, 'volume': {Timestamp('2022-05-09 15:00:00'): 3663.0, Timestamp('2022-05-09 16:00:00'): 6603.0, Timestamp('2022-05-09 17:00:00'): 2855.0, Timestamp('2022-05-09 18:00:00'): 20084.0, Timestamp('2022-05-09 19:00:00'): 8972.0, Timestamp('2022-05-09 20:00:00'): 5551.0, Timestamp('2022-05-09 21:00:00'): 8218.0, Timestamp('2022-05-09 22:00:00'): 7651.0, Timestamp('2022-05-09 23:00:00'): 6935.0, Timestamp('2022-05-10 00:00:00'): 10409.0}}
我对 pandaNewstarter 的简约方法,您只需将它应用到您的 candlestick_df:
import pandas as pd
import datetime
df = pd.read_csv("data.csv")
df.dtypes
# convert date column to dtype timestamp
df.date = pd.to_datetime(df.date)
# get min value from date colum
min_date = df.date.min()
# from min get next day midnight timestamp value
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
NextDay_Date
# create new DateFrame by slicing original
df2 = df[df.date >= NextDay_Date].copy()
输出:
print(NextDay_Date)
2022-05-10 00:00:00
print(df2)
date open high low close volume
9 2022-05-10 00:00:00 9.141 9.223 8.724 8.820 10409.0
10 2022-05-10 01:00:00 8.755 8.979 8.558 8.832 11522.0
11 2022-05-10 02:00:00 8.815 8.880 8.304 8.593 20969.0
12 2022-05-10 03:00:00 8.618 8.720 8.370 8.610 15794.0
13 2022-05-10 04:00:00 8.610 8.929 8.610 8.736 9410.0
.. ... ... ... ... ... ...