运行 pandas 数据框中每天每个类别的总数
Running total per category for each day in pandas dataframe
我有一个 pandas 数据框,其中包含并非每天都发生的股票交易,也不是针对每只股票:
目标是获取每只股票每天的(每日)权重。
Starting table and expected result
这意味着
- 创建完整的日期日历
- 在每个日期重复每只股票的累计股份
- 最后计算这个日期的重量
有人能帮我解决这个问题吗?我已经搜索了多个主题,但找不到任何可行的解决方案。
感谢提问。我尝试了这段代码,因为我要为投资构建数据框,所以这是一个很好的做法。试试这个,我认为它可以满足您的要求。
import pandas as pd
import datetime
# create df
trades = pd.DataFrame(index=['2011-02-16', '2011-02-16', '2011-02-17', '2014-03-20','2014-03-20', '2018-01-04'])
# build data
trades['stock'] = ['A', 'B', 'A', 'B', 'C', 'B']
trades['shares_Tr'] = [5,10,5,10,15,-20]
# create a range of dates for the balance dataframe
index_of_dates = pd.date_range(('2011-02-10'), ('2018-01-05')).tolist()
# create a balance dataframe with columns for each stock.
bal = pd.DataFrame(data = 0, index=index_of_dates, columns=['A', 'A_sum', 'A_weight', 'B', 'B_sum', 'B_weight', 'C', 'C_sum', 'C_weight', 'Total' ])
# populate the trades from trades df to the balance df.
for index, row in trades.iterrows():
bal.loc[index, row['stock']] = row['shares_Tr']
# track totals
bal['A_sum'] = bal['A'].cumsum()
bal['B_sum'] = bal['B'].cumsum()
bal['C_sum'] = bal['C'].cumsum()
bal['Total'] = bal.iloc[:,[1,4,7]].sum(axis=1)
bal['A_weight'] = bal['A_sum'] / bal['Total']
bal['B_weight'] = bal['B_sum'] / bal['Total']
bal['C_weight'] = bal['C_sum'] / bal['Total']
您将有两个数据框,一个名为 trades,另一个名为 bal,其中包含您的结果。
太棒了!这启发了我找到解决问题的方法!您的解决方案中的问题是,如果股票 D 出现(添加到下面的集合中)在初始数据集中,它将不再起作用。
我可以通过以下方式解决这个问题:
import pandas as pd
import datetime
# create df // build data // adding date as column
trades = pd.DataFrame()
trades['Date'] = pd.to_datetime(['2011-02-16', '2011-02-16', '2011-02-17', '2014-03-20','2014-03-20', '2018-01-04', '2011-02-18'])
trades['stock'] = ['A', 'B', 'A', 'B', 'C', 'B', 'D']
trades['shares_Tr'] = [5,10,5,10,15,-20,5]
# create a range of dates for the merged dataframe
index_of_dates = pd.date_range('2011-02-10', pd.datetime.today()).to_frame().reset_index(drop=True).rename(columns={0: 'Date'})
# create a merged dataframe with columns date / stock / stock_Tr.
merged = pd.merge(index_of_dates,trades,how='left', on='Date')
# create a pivottable showing the shares_TR of each stock for each date
shares_tr = merged.pivot(index='Date', columns='stock', values='shares_Tr').dropna(axis=1, how='all').fillna(0)
# calculate individual pivottables for the cumsum and weights
cumShares = shares_tr.cumsum()
weights = ((cumShares.T / cumShares.T.sum()).T*100).round(2)
# finally combine all data into one dataframe
all_data = pd.concat([shares_tr, cumShares, weights], axis=1, keys=['Shares','cumShares', 'Weights'])
all_data
我有一个 pandas 数据框,其中包含并非每天都发生的股票交易,也不是针对每只股票:
目标是获取每只股票每天的(每日)权重。
Starting table and expected result
这意味着 - 创建完整的日期日历 - 在每个日期重复每只股票的累计股份 - 最后计算这个日期的重量
有人能帮我解决这个问题吗?我已经搜索了多个主题,但找不到任何可行的解决方案。
感谢提问。我尝试了这段代码,因为我要为投资构建数据框,所以这是一个很好的做法。试试这个,我认为它可以满足您的要求。
import pandas as pd
import datetime
# create df
trades = pd.DataFrame(index=['2011-02-16', '2011-02-16', '2011-02-17', '2014-03-20','2014-03-20', '2018-01-04'])
# build data
trades['stock'] = ['A', 'B', 'A', 'B', 'C', 'B']
trades['shares_Tr'] = [5,10,5,10,15,-20]
# create a range of dates for the balance dataframe
index_of_dates = pd.date_range(('2011-02-10'), ('2018-01-05')).tolist()
# create a balance dataframe with columns for each stock.
bal = pd.DataFrame(data = 0, index=index_of_dates, columns=['A', 'A_sum', 'A_weight', 'B', 'B_sum', 'B_weight', 'C', 'C_sum', 'C_weight', 'Total' ])
# populate the trades from trades df to the balance df.
for index, row in trades.iterrows():
bal.loc[index, row['stock']] = row['shares_Tr']
# track totals
bal['A_sum'] = bal['A'].cumsum()
bal['B_sum'] = bal['B'].cumsum()
bal['C_sum'] = bal['C'].cumsum()
bal['Total'] = bal.iloc[:,[1,4,7]].sum(axis=1)
bal['A_weight'] = bal['A_sum'] / bal['Total']
bal['B_weight'] = bal['B_sum'] / bal['Total']
bal['C_weight'] = bal['C_sum'] / bal['Total']
您将有两个数据框,一个名为 trades,另一个名为 bal,其中包含您的结果。
太棒了!这启发了我找到解决问题的方法!您的解决方案中的问题是,如果股票 D 出现(添加到下面的集合中)在初始数据集中,它将不再起作用。
我可以通过以下方式解决这个问题:
import pandas as pd
import datetime
# create df // build data // adding date as column
trades = pd.DataFrame()
trades['Date'] = pd.to_datetime(['2011-02-16', '2011-02-16', '2011-02-17', '2014-03-20','2014-03-20', '2018-01-04', '2011-02-18'])
trades['stock'] = ['A', 'B', 'A', 'B', 'C', 'B', 'D']
trades['shares_Tr'] = [5,10,5,10,15,-20,5]
# create a range of dates for the merged dataframe
index_of_dates = pd.date_range('2011-02-10', pd.datetime.today()).to_frame().reset_index(drop=True).rename(columns={0: 'Date'})
# create a merged dataframe with columns date / stock / stock_Tr.
merged = pd.merge(index_of_dates,trades,how='left', on='Date')
# create a pivottable showing the shares_TR of each stock for each date
shares_tr = merged.pivot(index='Date', columns='stock', values='shares_Tr').dropna(axis=1, how='all').fillna(0)
# calculate individual pivottables for the cumsum and weights
cumShares = shares_tr.cumsum()
weights = ((cumShares.T / cumShares.T.sum()).T*100).round(2)
# finally combine all data into one dataframe
all_data = pd.concat([shares_tr, cumShares, weights], axis=1, keys=['Shares','cumShares', 'Weights'])
all_data