Pandas 从同一时间戳的多个交易的数据帧计算结果数据帧

Pandas calculate result dataframe from a dataframe of multiple trades at same timestamp

我有一个数据框,其中包含具有重复时间戳的交易以及分为几行的买卖订单。在我的示例中,总订单金额是该特定股票在同一时间戳上的总和。我创建了一个简化的数据框来显示数据的样子。 我想最终得到一个数据框,其中包含交易结果和每笔交易的交易 ID。 所有交易都是多头头寸,即买入并尝试以更高的价格卖出。 此线程中回答了所需输出 df2 的 ID 列 Create ID column in a pandas dataframe

import pandas as pd
from datetime import datetime
import numpy as np
     string_date =['2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 01:00:00',
             '2018-01-01 02:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 03:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 04:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 07:00:00',
             '2018-01-01 08:00:00',
             '2018-01-01 08:00:00',
             '2018-01-01 08:00:00',
             '2018-02-01 12:00:00',
            ]



data ={'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B'],
                    'deal': ['buy', 'buy', 'buy','buy','buy','sell','sell','sell','buy','buy','buy','sell','sell','sell','sell','sell','sell','buy'],
                    'amount':[1,2,3,4,10,8,1,1,3,2,5,2,2,6,3,3,4,5],
                    'price':[10,10,10,10,2,20,20,20,3,3,3,1,1,1,2,2,2,11]}

df = pd.DataFrame(data, index =string_date)
df
Out[245]: 
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:00:00     A   buy       2     10
2018-01-01 01:00:00     A   buy       3     10
2018-01-01 01:00:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:00:00     C   buy       2      3
2018-01-01 04:00:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11

一个期望的输出:

string_date2 =['2018-01-01 01:00:00',
               '2018-01-01 02:00:00',
               '2018-01-01 03:00:00',
               '2018-01-01 04:00:00',
               '2018-01-01 07:00:00',
               '2018-01-01 08:00:00',
               '2018-01-02 12:00:00',
               ]

data2 ={'stock': ['A','B', 'A', 'C', 'B','C','B'],
                    'deal': ['buy', 'buy','sell','buy','sell','sell','buy'],
                    'amount':[10,10,10,10,10,10,5],
                    'price':[10,2,20,3,1,2,11],
                    'ID': ['1', '2','1','3','2','3','4']
                    }

df2 = pd.DataFrame(data2, index =string_date2) 

df2
Out[226]: 
                    stock  deal  amount  price ID
2018-01-01 01:00:00     A   buy      10     10  1
2018-01-01 02:00:00     B   buy      10      2  2
2018-01-01 03:00:00     A  sell      10     20  1
2018-01-01 04:00:00     C   buy      10      3  3
2018-01-01 07:00:00     B  sell      10      1  2
2018-01-01 08:00:00     C  sell      10      2  3
2018-01-02 12:00:00     B   buy       5     11  4

有什么想法吗?

将您的 string_date 更改为:

In [2295]: string_date =['2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 01:00:00',
      ...:              '2018-01-01 02:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 03:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 04:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 07:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-01-01 08:00:00',
      ...:              '2018-02-01 12:00:00',
      ...:             ]
      ...: 

所以 df 现在是:

In [2297]: df
Out[2297]: 
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:00:00     A   buy       2     10
2018-01-01 01:00:00     A   buy       3     10
2018-01-01 01:00:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 03:00:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:00:00     C   buy       2      3
2018-01-01 04:00:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:00:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:00:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11

您可以使用 Groupby.agg:

In [2302]: x = df.reset_index().groupby(['index', 'stock', 'deal'], as_index=False).agg({'amount': 'sum', 'price': 'max'}).set_index('index')

In [2303]: m = x['deal'] == 'buy'

In [2305]: x['ID'] = m.cumsum().where(m)

In [2307]: x['ID'] = x.groupby('stock')['ID'].ffill()

In [2308]: x
Out[2308]: 
                     stock  deal  amount  price   ID
index                                              
2018-01-01 01:00:00     A   buy      10     10  1.0
2018-01-01 02:00:00     B   buy      10      2  2.0
2018-01-01 03:00:00     A  sell      10     20  1.0
2018-01-01 04:00:00     C   buy      10      3  3.0
2018-01-01 07:00:00     B  sell      10      1  2.0
2018-01-01 08:00:00     C  sell      10      2  3.0
2018-02-01 12:00:00     B   buy       5     11  4.0

此解决方案假设 'Long Only' 投资组合不允许卖空。一旦给定股票开仓,交易就会被分配一个新的交易 ID。增加该股票的头寸会导致相同的交易 ID,以及任何减少头寸规模的卖出交易(包括头寸数量减少到零的最终销售)。同一股票的后续买入交易会产生新的交易 ID。

为了在不断增长的交易日志中保持一致的交易标识符,我创建了一个 class TradeTracker 来跟踪和分配每笔交易的交易标识符。

import numpy as np
import pandas as pd

# Create sample dataframe.    
dates = [
    '2018-01-01 01:00:00',
    '2018-01-01 01:01:00',
    '2018-01-01 01:02:00',
    '2018-01-01 01:03:00',
    '2018-01-01 02:00:00',
    '2018-01-01 03:00:00',
    '2018-01-01 03:01:00',
    '2018-01-01 03:03:00',
    '2018-01-01 04:00:00',
    '2018-01-01 04:01:00',
    '2018-01-01 04:02:00',
    '2018-01-01 07:00:00',
    '2018-01-01 07:01:00',
    '2018-01-01 07:02:00',
    '2018-01-01 08:00:00',
    '2018-01-01 08:01:00',
    '2018-01-01 08:02:00',
    '2018-02-01 12:00:00',
    '2018-03-01 12:00:00',
]
data = {
    'stock': ['A','A','A','A','B','A','A','A','C','C','C','B','B','B','C','C','C','B','A'],
    'deal': ['buy', 'buy', 'buy', 'buy', 'buy', 'sell', 'sell', 'sell', 'buy', 'buy', 'buy',
             'sell', 'sell', 'sell', 'sell', 'sell', 'sell', 'buy', 'buy'],
    'amount': [1, 2, 3, 4, 10, 8, 1, 1, 3, 2, 5, 2, 2, 6, 3, 3, 4, 5, 10],
    'price': [10, 10, 10, 10, 2, 20, 20, 20, 3, 3, 3, 1, 1, 1, 2, 2, 2, 11, 15]
}
df = pd.DataFrame(data, index=pd.to_datetime(dates))
>>> df
                    stock  deal  amount  price
2018-01-01 01:00:00     A   buy       1     10
2018-01-01 01:01:00     A   buy       2     10
2018-01-01 01:02:00     A   buy       3     10
2018-01-01 01:03:00     A   buy       4     10
2018-01-01 02:00:00     B   buy      10      2
2018-01-01 03:00:00     A  sell       8     20
2018-01-01 03:01:00     A  sell       1     20
2018-01-01 03:03:00     A  sell       1     20
2018-01-01 04:00:00     C   buy       3      3
2018-01-01 04:01:00     C   buy       2      3
2018-01-01 04:02:00     C   buy       5      3
2018-01-01 07:00:00     B  sell       2      1
2018-01-01 07:01:00     B  sell       2      1
2018-01-01 07:02:00     B  sell       6      1
2018-01-01 08:00:00     C  sell       3      2
2018-01-01 08:01:00     C  sell       3      2
2018-01-01 08:02:00     C  sell       4      2
2018-02-01 12:00:00     B   buy       5     11
2018-03-01 12:00:00     A   buy      10     15

# Add `position` column representing the cumulative buys and sells for a given stock.
df['position'] = (
    df
    .assign(temp_amount=np.where(df['deal'].eq('buy'), df['amount'], -df['amount']))
    .groupby(['stock'])['temp_amount']
    .cumsum()
)

# Create a class to track trade identifiers and instantiate it.
class TradeTracker():
    def __init__(self):
        self.trade_counter = 0
        self.trade_ids = {}
    
    def get_trade_id(self, stock, position):
        if position == 0:
            trade_id = self.trade_ids.pop(stock)
        elif stock not in self.trade_ids:
            self.trade_counter += 1
            self.trade_ids[stock] = trade_id = self.trade_counter
        else:
            trade_id = self.trade_ids[stock]
        return trade_id

trade_tracker = TradeTracker()

# Add a `trade_id` column using our custom class in a list comprehension.
df['trade_id'] = [trade_tracker.get_trade_id(stock, position) 
                  for stock, position in df[['stock', 'position']].to_numpy()]

>>> df
                    stock  deal  amount  price  position  trade_id
2018-01-01 01:00:00     A   buy       1     10         1         1
2018-01-01 01:01:00     A   buy       2     10         3         1
2018-01-01 01:02:00     A   buy       3     10         6         1
2018-01-01 01:03:00     A   buy       4     10        10         1
2018-01-01 02:00:00     B   buy      10      2        10         2
2018-01-01 03:00:00     A  sell       8     20         2         1
2018-01-01 03:01:00     A  sell       1     20         1         1
2018-01-01 03:03:00     A  sell       1     20         0         1
2018-01-01 04:00:00     C   buy       3      3         3         3
2018-01-01 04:01:00     C   buy       2      3         5         3
2018-01-01 04:02:00     C   buy       5      3        10         3
2018-01-01 07:00:00     B  sell       2      1         8         2
2018-01-01 07:01:00     B  sell       2      1         6         2
2018-01-01 07:02:00     B  sell       6      1         0         2
2018-01-01 08:00:00     C  sell       3      2         7         3
2018-01-01 08:01:00     C  sell       3      2         4         3
2018-01-01 08:02:00     C  sell       4      2         0         3
2018-02-01 12:00:00     B   buy       5     11         5         4
2018-03-01 12:00:00     A   buy      10     15        10         5