pandas 数据帧中两个值之间的差异，它们的长度是可变的

Question

我正在尝试从我的交易中自动计算 profit/loss。目前我的 pandas daatframe 设置为 return 保留列，其中包含 1，当购买处于活动状态时，-1，一旦我售出。价格栏记录股票的价格，而持有时间和计数栏以两种不同的方式跟踪交易持有的时间。

我正在努力做的是计算我有多少钱made/lost。我需要它来计算（作为百分比）购买价格（第一个非零值）和销售价格（一系列中的最后一个非零值）之间的差异。挑战来自 tardes 的长度可变，因此 df.shift 不起作用。

下面是一个示例数据集：

谢谢，如有不明之处请追问

         Date   Hold  Price  Hold_Time   count
148  20190801     0   0.00          0       0
149  20190802     0   0.00          0       0
150  20190805     0   0.00          0       0
151  20190806     1  21.50          1       1
152  20190807     1  22.48          1       2
153  20190808     1  22.78          1       3
154  20190809     1  24.17          1       4
155  20190812     1  23.72          1       5
156  20190813    -1  23.39          0       0
157  20190814     0   0.00          0       0
158  20190815     0   0.00          0       0
159  20190816     0   0.00          0       0
160  20190819     0   0.00          0       0
161  20190820     0   0.00          0       0
162  20190821     0   0.00          0       0
163  20190822     0   0.00          0       0
164  20190823     1  24.80          1       1
165  20190826     1  24.00          1       2
166  20190827    -1  24.65          0       0
167  20190828     0      0          0       0
168  20190829     0      0          0       0

Answer 1

pd.groupby 是你的朋友，虽然有点迂回。您可以使用它通过将值与 0 和先前的值进行比较，将每个单独的 "holding" 系列放在一个单独的容器中——“0”系列也会在这里创建一个组，我们必须随后将其删除。

blocks = df["Price"].groupby(((df["Price"] != 0) != (df["Price"] != 0).shift()).cumsum())
buy_values = blocks.first()
buy_values = buy_values[buy_values != 0]
sell_values = blocks.last()
sell_values = sell_values[sell_values != 0]
difference = sell_values - buy_values
percent_difference = difference / buy_values * 100

这仅使用数据集的 "Price" 列。使用其他列可以使解决方案更简单/更清晰，但这应该可以满足您的要求！

Answer 2

感谢您提供易于使用的数据集。考虑到它被命名为 'data'，我提出以下解决方案

import pandas as pd
import numpy as np

data = pd.read_clipboard()

df = data.copy() # copy data on another dataframe

# keep only rows where you bought or sell:
df['transaction_id'] = df.Hold_Time - df.Hold_Time.shift()
df = df.query('transaction_id!=0').dropna()

# calculate profit/loss for each time you sold
df['profit'] = np.where(df.Hold == -1, df.Price - df.Price.shift(), 0)

# calculate total profit (or anything else you want, I hope it will be easy at this point) 
TOTAL_PROFIT = df.profit.sum()

pandas 数据帧中两个值之间的差异，它们的长度是可变的

difference between two values in a pandas dataframe that are variable lengths apart

shift

stock

pandas