如何获取 Python 数据框中的下一行值?

How can I get the next row value in a Python dataframe?

我是新 Python 用户,我正在努力学习这个,以便完成一个关于加密货币的研究项目。我想做的是在找到条件后立即检索值,然后在另一个变量中检索 7 行后的值。

我正在使用一个 Excel 电子表格,它有 2250 行和 25 列。通过添加如下详述的 4 列,我得到 29 列。它有很多 0(未找到模式)和几个 100(已找到模式)。我希望我的程序在出现 100 的行之后立即获取该行,并且 return 它是收盘价。这样,我就可以看到模式当天和模式后一天之间的区别。我还想在接下来的 7 天内这样做,以找出该模式在一周内的表现。

Here's a screenshot of the spreadsheet to illustrate this

您也可以看到 -100 个单元格,这些是看跌形态识别。现在我只想使用“100”个单元格,这样我至少可以完成这项工作。

我希望这发生:

import pandas as pd
import talib
import csv
import numpy as np

my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)

df['Next Close'] = np.nan_to_num(0) #adding these next four columns to my dataframe so I can fill them up with the later variables#
df['Variation2'] = np.nan_to_num(0)
df['Next Week Close'] = np.nan_to_num(0)
df['Next Week Variation'] = np.nan_to_num(0)
df['Close'].astype(float)

for row in df.itertuples(index=True):
    str(row[7:23])
    if ((row[7:23]) == 100):
        nextclose = np.where(row[7:23] == row[7:23]+1)[0] #(I Want this to be the next row after having found the condition)#
    if (row.Index + 7 < len(df)):
        nextweekclose = np.where(row[7:23] == row[7:23]+7)[0] #(I want this to be the 7th row after having found the condition)#
    else:
        nextweekclose = 0

我想要这些值的原因是稍后将它们与这些变量进行比较:

variation2 = (nextclose - row.Close) / row.Close * 100
    nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
    df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = true)

我的错误来自于我不知道如何检索 row+1 值和 row+7 值。我整天都在网上搜索高低,但没有找到具体的方法来做到这一点。无论我想出哪个主意都会给我一个 "can only concatenate tuple (not "int") to tuple" 错误,或者 "AttributeError: 'Series' 对象没有属性 'close'"。这是我尝试时得到的第二个:

for row in df.itertuples(index=True):
    str(row[7:23])
    if ((row[7:23]) == 100):
        nextclose = df.iloc[row.Index + 1,:].close
    if (row.Index + 7 < len(df)):
        nextweekclose = df.iloc[row.Index + 7,:].close
    else:
        nextweekclose = 0

我真的很想在这方面得到一些帮助。 使用 Jupyter 笔记本。

编辑:已修复

我终于成功了!编程似乎经常出现这种情况(是的,我是新来的......),错误是因为我无法跳出框框思考。当问题 运行 比那更深时,我被说服了我的代码的某个部分是问题。

感谢 BenB 和 Michael Gardner,我已经修复了我的代码,现在 return 是我想要的。在这里。

import pandas as pd
import talib
import csv
import numpy as np
        
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
        
        
#Creating my four new columns. In my first message I thought I needed to fill them up
#with 0s (or NaNs) and then fill them up with their respective content later. 
#It is actually much simpler to make the operations right now, keeping in mind 
#that I need to reference df['Column Of Interest'] every time.
    
df['Next Close'] = df['Close'].shift(-1)
df['Variation2'] = (((df['Next Close'] - df['Close']) / df['Close']) * 100)
df['Next Week Close'] = df['Close'].shift(-7)
df['Next Week Variation'] = (((df['Next Week Close'] - df['Close']) / df['Close']) * 100)
    
#The only use of this is for me to have a visual representation of my newly created columns#
print(df)
        
for row in df.itertuples(index=True):
    if 100 or -100 in row[7:23]:
        nextclose = df['Next Close']
            
    if (row.Index + 7 < len(df)) and 100 or -100 in row[7:23]:
            nextweekclose = df['Next Week Close']
        else:
            nextweekclose = 0
                
        variation2 = (nextclose - row.Close) / row.Close * 100
        nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
        df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = True)
        
df.to_csv('gatherinmahdata3.csv')

如果我理解正确,您应该能够使用 shift 将行移动所需的数量,然后进行条件计算。

import pandas as pd
import numpy as np

df = pd.DataFrame({'Close': np.arange(8)})

df['Next Close'] = df['Close'].shift(-1)
df['Next Week Close'] = df['Close'].shift(-7)

df.head(10)

   Close  Next Close  Next Week Close
0      0         1.0              7.0
1      1         2.0              NaN
2      2         3.0              NaN
3      3         4.0              NaN
4      4         5.0              NaN
5      5         6.0              NaN
6      6         7.0              NaN
7      7         NaN              NaN

df['Conditional Calculation'] = np.where(df['Close'].mod(2).eq(0), df['Close'] * df['Next Close'], df['Close'])

df.head(10)

   Close  Next Close  Next Week Close  Conditional Calculation
0      0         1.0              7.0                      0.0
1      1         2.0              NaN                      1.0
2      2         3.0              NaN                      6.0
3      3         4.0              NaN                      3.0
4      4         5.0              NaN                     20.0
5      5         6.0              NaN                      5.0
6      6         7.0              NaN                     42.0
7      7         NaN              NaN                      7.0

从您的更新中可以清楚地看出,第一个 if 语句检查您的行中是否存在值“100”。你会用

if 100 in row[7:23]:

这将检查整数 100 是否在包含行的第 7 到 23 列(不包括 23 本身)的元组的元素之一中。

如果您仔细查看收到的错误消息,您会发现问题所在:

TypeError: can only concatenate tuple (not "int") to tuple

来自

nextclose = np.where(row[7:23] == row[7:23]+1)[0]

row 是一个元组,将它切片只会给你一个更短的元组,你试图在其中添加一个整数,如错误消息中所述。或许可以看看 numpy.where 的文档,看看它一般是如何工作的,但我认为在这种情况下并不需要它。 这给我们带来了您的第二条错误消息:

AttributeError: 'Series' object has no attribute 'close'

这是区分大小写的,对我来说,如果我只是将“Close”的收盘价大写(与 Index 必须大写的原因相同):

nextclose = df.iloc[row.Index + 1,:].Close

原则上您可以使用另一个回复中提到的 shift 方法,为了简单起见,我建议您使用它,但我想指出另一种方法,因为我认为理解它们对于使用数据帧很重要:

nextclose = df.iloc[row[0]+1]["Close"]
nextclose = df.iloc[row[0]+1].Close
nextclose = df.loc[row.Index + 1, "Close"]

它们都有效,而且可能还有更多的可能性。我真的不能告诉你哪些是最快的或者是否有任何差异,但它们在处理数据帧时非常常用。因此,我建议仔细查看您使用的方法的文档,尤其是它们的数据类型return。希望这有助于更多地理解该主题。