如何获取 Python 数据框中的下一行值?
How can I get the next row value in a Python dataframe?
我是新 Python 用户,我正在努力学习这个,以便完成一个关于加密货币的研究项目。我想做的是在找到条件后立即检索值,然后在另一个变量中检索 7 行后的值。
我正在使用一个 Excel 电子表格,它有 2250 行和 25 列。通过添加如下详述的 4 列,我得到 29 列。它有很多 0(未找到模式)和几个 100(已找到模式)。我希望我的程序在出现 100 的行之后立即获取该行,并且 return 它是收盘价。这样,我就可以看到模式当天和模式后一天之间的区别。我还想在接下来的 7 天内这样做,以找出该模式在一周内的表现。
Here's a screenshot of the spreadsheet to illustrate this
您也可以看到 -100 个单元格,这些是看跌形态识别。现在我只想使用“100”个单元格,这样我至少可以完成这项工作。
我希望这发生:
import pandas as pd
import talib
import csv
import numpy as np
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
df['Next Close'] = np.nan_to_num(0) #adding these next four columns to my dataframe so I can fill them up with the later variables#
df['Variation2'] = np.nan_to_num(0)
df['Next Week Close'] = np.nan_to_num(0)
df['Next Week Variation'] = np.nan_to_num(0)
df['Close'].astype(float)
for row in df.itertuples(index=True):
str(row[7:23])
if ((row[7:23]) == 100):
nextclose = np.where(row[7:23] == row[7:23]+1)[0] #(I Want this to be the next row after having found the condition)#
if (row.Index + 7 < len(df)):
nextweekclose = np.where(row[7:23] == row[7:23]+7)[0] #(I want this to be the 7th row after having found the condition)#
else:
nextweekclose = 0
我想要这些值的原因是稍后将它们与这些变量进行比较:
variation2 = (nextclose - row.Close) / row.Close * 100
nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = true)
我的错误来自于我不知道如何检索 row+1 值和 row+7 值。我整天都在网上搜索高低,但没有找到具体的方法来做到这一点。无论我想出哪个主意都会给我一个 "can only concatenate tuple (not "int") to tuple" 错误,或者 "AttributeError: 'Series' 对象没有属性 'close'"。这是我尝试时得到的第二个:
for row in df.itertuples(index=True):
str(row[7:23])
if ((row[7:23]) == 100):
nextclose = df.iloc[row.Index + 1,:].close
if (row.Index + 7 < len(df)):
nextweekclose = df.iloc[row.Index + 7,:].close
else:
nextweekclose = 0
我真的很想在这方面得到一些帮助。
使用 Jupyter 笔记本。
编辑:已修复
我终于成功了!编程似乎经常出现这种情况(是的,我是新来的......),错误是因为我无法跳出框框思考。当问题 运行 比那更深时,我被说服了我的代码的某个部分是问题。
感谢 BenB 和 Michael Gardner,我已经修复了我的代码,现在 return 是我想要的。在这里。
import pandas as pd
import talib
import csv
import numpy as np
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
#Creating my four new columns. In my first message I thought I needed to fill them up
#with 0s (or NaNs) and then fill them up with their respective content later.
#It is actually much simpler to make the operations right now, keeping in mind
#that I need to reference df['Column Of Interest'] every time.
df['Next Close'] = df['Close'].shift(-1)
df['Variation2'] = (((df['Next Close'] - df['Close']) / df['Close']) * 100)
df['Next Week Close'] = df['Close'].shift(-7)
df['Next Week Variation'] = (((df['Next Week Close'] - df['Close']) / df['Close']) * 100)
#The only use of this is for me to have a visual representation of my newly created columns#
print(df)
for row in df.itertuples(index=True):
if 100 or -100 in row[7:23]:
nextclose = df['Next Close']
if (row.Index + 7 < len(df)) and 100 or -100 in row[7:23]:
nextweekclose = df['Next Week Close']
else:
nextweekclose = 0
variation2 = (nextclose - row.Close) / row.Close * 100
nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = True)
df.to_csv('gatherinmahdata3.csv')
如果我理解正确,您应该能够使用 shift
将行移动所需的数量,然后进行条件计算。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Close': np.arange(8)})
df['Next Close'] = df['Close'].shift(-1)
df['Next Week Close'] = df['Close'].shift(-7)
df.head(10)
Close Next Close Next Week Close
0 0 1.0 7.0
1 1 2.0 NaN
2 2 3.0 NaN
3 3 4.0 NaN
4 4 5.0 NaN
5 5 6.0 NaN
6 6 7.0 NaN
7 7 NaN NaN
df['Conditional Calculation'] = np.where(df['Close'].mod(2).eq(0), df['Close'] * df['Next Close'], df['Close'])
df.head(10)
Close Next Close Next Week Close Conditional Calculation
0 0 1.0 7.0 0.0
1 1 2.0 NaN 1.0
2 2 3.0 NaN 6.0
3 3 4.0 NaN 3.0
4 4 5.0 NaN 20.0
5 5 6.0 NaN 5.0
6 6 7.0 NaN 42.0
7 7 NaN NaN 7.0
从您的更新中可以清楚地看出,第一个 if 语句检查您的行中是否存在值“100”。你会用
if 100 in row[7:23]:
这将检查整数 100 是否在包含行的第 7 到 23 列(不包括 23 本身)的元组的元素之一中。
如果您仔细查看收到的错误消息,您会发现问题所在:
TypeError: can only concatenate tuple (not "int") to tuple
来自
nextclose = np.where(row[7:23] == row[7:23]+1)[0]
row 是一个元组,将它切片只会给你一个更短的元组,你试图在其中添加一个整数,如错误消息中所述。或许可以看看 numpy.where 的文档,看看它一般是如何工作的,但我认为在这种情况下并不需要它。
这给我们带来了您的第二条错误消息:
AttributeError: 'Series' object has no attribute 'close'
这是区分大小写的,对我来说,如果我只是将“Close”的收盘价大写(与 Index 必须大写的原因相同):
nextclose = df.iloc[row.Index + 1,:].Close
原则上您可以使用另一个回复中提到的 shift 方法,为了简单起见,我建议您使用它,但我想指出另一种方法,因为我认为理解它们对于使用数据帧很重要:
nextclose = df.iloc[row[0]+1]["Close"]
nextclose = df.iloc[row[0]+1].Close
nextclose = df.loc[row.Index + 1, "Close"]
它们都有效,而且可能还有更多的可能性。我真的不能告诉你哪些是最快的或者是否有任何差异,但它们在处理数据帧时非常常用。因此,我建议仔细查看您使用的方法的文档,尤其是它们的数据类型return。希望这有助于更多地理解该主题。
我是新 Python 用户,我正在努力学习这个,以便完成一个关于加密货币的研究项目。我想做的是在找到条件后立即检索值,然后在另一个变量中检索 7 行后的值。
我正在使用一个 Excel 电子表格,它有 2250 行和 25 列。通过添加如下详述的 4 列,我得到 29 列。它有很多 0(未找到模式)和几个 100(已找到模式)。我希望我的程序在出现 100 的行之后立即获取该行,并且 return 它是收盘价。这样,我就可以看到模式当天和模式后一天之间的区别。我还想在接下来的 7 天内这样做,以找出该模式在一周内的表现。
Here's a screenshot of the spreadsheet to illustrate this
您也可以看到 -100 个单元格,这些是看跌形态识别。现在我只想使用“100”个单元格,这样我至少可以完成这项工作。
我希望这发生:
import pandas as pd
import talib
import csv
import numpy as np
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
df['Next Close'] = np.nan_to_num(0) #adding these next four columns to my dataframe so I can fill them up with the later variables#
df['Variation2'] = np.nan_to_num(0)
df['Next Week Close'] = np.nan_to_num(0)
df['Next Week Variation'] = np.nan_to_num(0)
df['Close'].astype(float)
for row in df.itertuples(index=True):
str(row[7:23])
if ((row[7:23]) == 100):
nextclose = np.where(row[7:23] == row[7:23]+1)[0] #(I Want this to be the next row after having found the condition)#
if (row.Index + 7 < len(df)):
nextweekclose = np.where(row[7:23] == row[7:23]+7)[0] #(I want this to be the 7th row after having found the condition)#
else:
nextweekclose = 0
我想要这些值的原因是稍后将它们与这些变量进行比较:
variation2 = (nextclose - row.Close) / row.Close * 100
nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = true)
我的错误来自于我不知道如何检索 row+1 值和 row+7 值。我整天都在网上搜索高低,但没有找到具体的方法来做到这一点。无论我想出哪个主意都会给我一个 "can only concatenate tuple (not "int") to tuple" 错误,或者 "AttributeError: 'Series' 对象没有属性 'close'"。这是我尝试时得到的第二个:
for row in df.itertuples(index=True):
str(row[7:23])
if ((row[7:23]) == 100):
nextclose = df.iloc[row.Index + 1,:].close
if (row.Index + 7 < len(df)):
nextweekclose = df.iloc[row.Index + 7,:].close
else:
nextweekclose = 0
我真的很想在这方面得到一些帮助。 使用 Jupyter 笔记本。
编辑:已修复
我终于成功了!编程似乎经常出现这种情况(是的,我是新来的......),错误是因为我无法跳出框框思考。当问题 运行 比那更深时,我被说服了我的代码的某个部分是问题。
感谢 BenB 和 Michael Gardner,我已经修复了我的代码,现在 return 是我想要的。在这里。
import pandas as pd
import talib
import csv
import numpy as np
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
#Creating my four new columns. In my first message I thought I needed to fill them up
#with 0s (or NaNs) and then fill them up with their respective content later.
#It is actually much simpler to make the operations right now, keeping in mind
#that I need to reference df['Column Of Interest'] every time.
df['Next Close'] = df['Close'].shift(-1)
df['Variation2'] = (((df['Next Close'] - df['Close']) / df['Close']) * 100)
df['Next Week Close'] = df['Close'].shift(-7)
df['Next Week Variation'] = (((df['Next Week Close'] - df['Close']) / df['Close']) * 100)
#The only use of this is for me to have a visual representation of my newly created columns#
print(df)
for row in df.itertuples(index=True):
if 100 or -100 in row[7:23]:
nextclose = df['Next Close']
if (row.Index + 7 < len(df)) and 100 or -100 in row[7:23]:
nextweekclose = df['Next Week Close']
else:
nextweekclose = 0
variation2 = (nextclose - row.Close) / row.Close * 100
nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = True)
df.to_csv('gatherinmahdata3.csv')
如果我理解正确,您应该能够使用 shift
将行移动所需的数量,然后进行条件计算。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Close': np.arange(8)})
df['Next Close'] = df['Close'].shift(-1)
df['Next Week Close'] = df['Close'].shift(-7)
df.head(10)
Close Next Close Next Week Close
0 0 1.0 7.0
1 1 2.0 NaN
2 2 3.0 NaN
3 3 4.0 NaN
4 4 5.0 NaN
5 5 6.0 NaN
6 6 7.0 NaN
7 7 NaN NaN
df['Conditional Calculation'] = np.where(df['Close'].mod(2).eq(0), df['Close'] * df['Next Close'], df['Close'])
df.head(10)
Close Next Close Next Week Close Conditional Calculation
0 0 1.0 7.0 0.0
1 1 2.0 NaN 1.0
2 2 3.0 NaN 6.0
3 3 4.0 NaN 3.0
4 4 5.0 NaN 20.0
5 5 6.0 NaN 5.0
6 6 7.0 NaN 42.0
7 7 NaN NaN 7.0
从您的更新中可以清楚地看出,第一个 if 语句检查您的行中是否存在值“100”。你会用
if 100 in row[7:23]:
这将检查整数 100 是否在包含行的第 7 到 23 列(不包括 23 本身)的元组的元素之一中。
如果您仔细查看收到的错误消息,您会发现问题所在:
TypeError: can only concatenate tuple (not "int") to tuple
来自
nextclose = np.where(row[7:23] == row[7:23]+1)[0]
row 是一个元组,将它切片只会给你一个更短的元组,你试图在其中添加一个整数,如错误消息中所述。或许可以看看 numpy.where 的文档,看看它一般是如何工作的,但我认为在这种情况下并不需要它。 这给我们带来了您的第二条错误消息:
AttributeError: 'Series' object has no attribute 'close'
这是区分大小写的,对我来说,如果我只是将“Close”的收盘价大写(与 Index 必须大写的原因相同):
nextclose = df.iloc[row.Index + 1,:].Close
原则上您可以使用另一个回复中提到的 shift 方法,为了简单起见,我建议您使用它,但我想指出另一种方法,因为我认为理解它们对于使用数据帧很重要:
nextclose = df.iloc[row[0]+1]["Close"]
nextclose = df.iloc[row[0]+1].Close
nextclose = df.loc[row.Index + 1, "Close"]
它们都有效,而且可能还有更多的可能性。我真的不能告诉你哪些是最快的或者是否有任何差异,但它们在处理数据帧时非常常用。因此,我建议仔细查看您使用的方法的文档,尤其是它们的数据类型return。希望这有助于更多地理解该主题。