检查数据框中的值是否会增加或减少一定百分比
Check if a value in dataframe will increase or decrease a certain percentage
我有一个 OHLC 数据框,例如:
index
open
close
high
low
2021-03-23 10:00:00+00:00
1421.100
1424.500
1427.720
1422.650
2021-03-23 11:00:00+00:00
1424.500
1421.480
1422.400
1411.890
2021-03-23 12:00:00+00:00
1421.480
1435.170
1443.980
1433.780
2021-03-23 13:00:00+00:00
1435.170
1440.860
1443.190
1437.590
2021-03-23 14:00:00+00:00
1440.860
1438.920
1443.570
1435.200
2021-03-23 15:00:00+00:00
1438.920
1435.990
1444.840
1435.060
2021-03-23 16:00:00+00:00
1435.990
1441.920
1446.610
1441.450
现在我想知道价格是先涨还是跌1%。到目前为止,我所拥有的是以下工作代码:
def check(x):
check = ohlc[ohlc.index > x.name]
price = ohlc.at[x.name, 'close']
high_thr = price * 1.01
low_thr = price * 0.99
high_indexes = check[check['high'] > high_thr]
low_indexes = check[check['low'] < low_thr]
if high_indexes.shape[0] > 0 and low_indexes.shape[0] > 0:
high = high_indexes.index[0]
low = low_indexes.index[0]
if high < low:
return 1
elif high > low:
return -1
else:
return 0
else:
return 0
ohlc['check'] = ohlc.apply(find_threshold, axis=1)
这对于较大的数据集来说非常慢。除了遍历每一行、切片并找到所有索引以获得最近的索引之外,还有其他更好的方法吗?
我认为最好的方法与您的做法没有太大区别:
from datetime import timedelta
def check(x, change=0.01):
time = x.name
price = ohlc.loc[time, 'close']
while True:
if time not in ohlc.index: # If we reach the end
return 0
high = ohlc.loc[time, 'high']
low = ohlc.loc[time, 'low']
if high > (1.0 + change) * price: # Upper thresh broken
return 1
elif low < 1.0 - change) * price: # Lower thresh broken
return -1
time = time + timedelta(hours=1) # Time update
ohlc['check'] = ohlc.apply(check, axis=1)
如果您担心的是效率,那么应用这种方式效率会稍微高一些,因为它只向前看需要突破阈值的距离。或者,您可以通过修改 while 循环将每行的检查次数限制为 100:
endtime = time + timedelta(hours=100)
while time < endtime:
# etc
我有一个 OHLC 数据框,例如:
index | open | close | high | low |
---|---|---|---|---|
2021-03-23 10:00:00+00:00 | 1421.100 | 1424.500 | 1427.720 | 1422.650 |
2021-03-23 11:00:00+00:00 | 1424.500 | 1421.480 | 1422.400 | 1411.890 |
2021-03-23 12:00:00+00:00 | 1421.480 | 1435.170 | 1443.980 | 1433.780 |
2021-03-23 13:00:00+00:00 | 1435.170 | 1440.860 | 1443.190 | 1437.590 |
2021-03-23 14:00:00+00:00 | 1440.860 | 1438.920 | 1443.570 | 1435.200 |
2021-03-23 15:00:00+00:00 | 1438.920 | 1435.990 | 1444.840 | 1435.060 |
2021-03-23 16:00:00+00:00 | 1435.990 | 1441.920 | 1446.610 | 1441.450 |
现在我想知道价格是先涨还是跌1%。到目前为止,我所拥有的是以下工作代码:
def check(x):
check = ohlc[ohlc.index > x.name]
price = ohlc.at[x.name, 'close']
high_thr = price * 1.01
low_thr = price * 0.99
high_indexes = check[check['high'] > high_thr]
low_indexes = check[check['low'] < low_thr]
if high_indexes.shape[0] > 0 and low_indexes.shape[0] > 0:
high = high_indexes.index[0]
low = low_indexes.index[0]
if high < low:
return 1
elif high > low:
return -1
else:
return 0
else:
return 0
ohlc['check'] = ohlc.apply(find_threshold, axis=1)
这对于较大的数据集来说非常慢。除了遍历每一行、切片并找到所有索引以获得最近的索引之外,还有其他更好的方法吗?
我认为最好的方法与您的做法没有太大区别:
from datetime import timedelta
def check(x, change=0.01):
time = x.name
price = ohlc.loc[time, 'close']
while True:
if time not in ohlc.index: # If we reach the end
return 0
high = ohlc.loc[time, 'high']
low = ohlc.loc[time, 'low']
if high > (1.0 + change) * price: # Upper thresh broken
return 1
elif low < 1.0 - change) * price: # Lower thresh broken
return -1
time = time + timedelta(hours=1) # Time update
ohlc['check'] = ohlc.apply(check, axis=1)
如果您担心的是效率,那么应用这种方式效率会稍微高一些,因为它只向前看需要突破阈值的距离。或者,您可以通过修改 while 循环将每行的检查次数限制为 100:
endtime = time + timedelta(hours=100)
while time < endtime:
# etc