在迭代时间序列数据中的每一行时计算 pandas 中当前行的总和
Calculating sum of up to the current row in pandas while iterating on each row in a time series data
假设我有以下代码计算在给定预算的情况下我可以购买多少产品-
import math
import pandas as pd
data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])
df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()
# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)
# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day
# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a
# product, for future purchase. For example, if you observe
# the resulting dataframe at the end of this code, you'll
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was 0. That is because
# on the days leading up to March 1, 2021, I have been saving
# the spare change from previous product purchases and that
# extra money allows me to buy an extra product on March 1, 2021
# despite my budget of 0 should only allow me to purchase
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0
# Suppose I save 0 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0
for index, row in df.iterrows():
if df.loc[index, 'Day'] == 1:
# num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
df.loc[index, 'Num_of_Products_Bought'] = math.floor(
(sum(df.iloc[0:(index + 1)]['Deposit'])
- sum(df.iloc[0:(index + 1)]['Withdrawal']))
/ df.loc[index, 'Current_Price'])
# Record how much I spent buying the product on specific date
df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']
print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.
如代码中的注释中所述,我想知道如何完成相同的操作,而不必逐行迭代并在我的迭代中计算直到当前行的行总和(我阅读 Whosebug 并看到 cumsum()
函数,但我认为 cumsum
没有迭代中当前行的概念。
非常感谢您的 suggestions/answers!
使用.apply
的解决方案:
def fn():
leftover = 0
amount, deposit = yield
while True:
new_amount, new_deposit = yield (deposit + leftover) // amount
leftover = (deposit + leftover) % amount
amount, deposit = new_amount, new_deposit
df = df.set_index("Date")
s = fn()
next(s)
m = df.index.day == 1
df.loc[m, "Deposit"] = 100
df.loc[m, "Num_of_Products_Bought"] = df.loc[
m, ["Current_Price", "Deposit"]
].apply(lambda x: s.send((x["Current_Price"], x["Deposit"])), axis=1)
df.loc[m, "Withdrawal"] = (
df.loc[m, "Num_of_Products_Bought"] * df.loc[m, "Current_Price"]
)
print(df.fillna(0).reset_index())
打印:
Date Current_Price Deposit Num_of_Products_Bought Withdrawal
0 2021-01-01 5.5 100.0 18.0 99.0
1 2021-01-02 5.5 0.0 0.0 0.0
2 2021-01-03 5.5 0.0 0.0 0.0
3 2021-01-04 5.5 0.0 0.0 0.0
4 2021-01-05 5.5 0.0 0.0 0.0
5 2021-01-06 5.5 0.0 0.0 0.0
6 2021-01-07 5.5 0.0 0.0 0.0
7 2021-01-08 5.5 0.0 0.0 0.0
8 2021-01-09 5.5 0.0 0.0 0.0
9 2021-01-10 5.5 0.0 0.0 0.0
10 2021-01-11 5.5 0.0 0.0 0.0
11 2021-01-12 5.5 0.0 0.0 0.0
12 2021-01-13 5.5 0.0 0.0 0.0
13 2021-01-14 5.5 0.0 0.0 0.0
14 2021-01-15 5.5 0.0 0.0 0.0
15 2021-01-16 5.5 0.0 0.0 0.0
16 2021-01-17 5.5 0.0 0.0 0.0
17 2021-01-18 5.5 0.0 0.0 0.0
18 2021-01-19 5.5 0.0 0.0 0.0
19 2021-01-20 5.5 0.0 0.0 0.0
20 2021-01-21 5.5 0.0 0.0 0.0
21 2021-01-22 5.5 0.0 0.0 0.0
22 2021-01-23 5.5 0.0 0.0 0.0
23 2021-01-24 5.5 0.0 0.0 0.0
24 2021-01-25 5.5 0.0 0.0 0.0
25 2021-01-26 5.5 0.0 0.0 0.0
26 2021-01-27 5.5 0.0 0.0 0.0
27 2021-01-28 5.5 0.0 0.0 0.0
28 2021-01-29 5.5 0.0 0.0 0.0
29 2021-01-30 5.5 0.0 0.0 0.0
30 2021-01-31 5.5 0.0 0.0 0.0
31 2021-02-01 10.5 100.0 9.0 94.5
32 2021-02-02 10.5 0.0 0.0 0.0
33 2021-02-03 10.5 0.0 0.0 0.0
34 2021-02-04 10.5 0.0 0.0 0.0
35 2021-02-05 10.5 0.0 0.0 0.0
36 2021-02-06 10.5 0.0 0.0 0.0
37 2021-02-07 10.5 0.0 0.0 0.0
38 2021-02-08 10.5 0.0 0.0 0.0
39 2021-02-09 10.5 0.0 0.0 0.0
40 2021-02-10 10.5 0.0 0.0 0.0
41 2021-02-11 10.5 0.0 0.0 0.0
42 2021-02-12 10.5 0.0 0.0 0.0
43 2021-02-13 10.5 0.0 0.0 0.0
44 2021-02-14 10.5 0.0 0.0 0.0
45 2021-02-15 10.5 0.0 0.0 0.0
46 2021-02-16 10.5 0.0 0.0 0.0
47 2021-02-17 10.5 0.0 0.0 0.0
48 2021-02-18 10.5 0.0 0.0 0.0
49 2021-02-19 10.5 0.0 0.0 0.0
50 2021-02-20 10.5 0.0 0.0 0.0
51 2021-02-21 10.5 0.0 0.0 0.0
52 2021-02-22 10.5 0.0 0.0 0.0
53 2021-02-23 10.5 0.0 0.0 0.0
54 2021-02-24 10.5 0.0 0.0 0.0
55 2021-02-25 10.5 0.0 0.0 0.0
56 2021-02-26 10.5 0.0 0.0 0.0
57 2021-02-27 10.5 0.0 0.0 0.0
58 2021-02-28 10.5 0.0 0.0 0.0
59 2021-03-01 15.0 100.0 7.0 105.0
60 2021-03-02 15.0 0.0 0.0 0.0
61 2021-03-03 15.0 0.0 0.0 0.0
62 2021-03-04 15.0 0.0 0.0 0.0
63 2021-03-05 15.0 0.0 0.0 0.0
64 2021-03-06 15.0 0.0 0.0 0.0
65 2021-03-07 15.0 0.0 0.0 0.0
66 2021-03-08 15.0 0.0 0.0 0.0
67 2021-03-09 15.0 0.0 0.0 0.0
68 2021-03-10 15.0 0.0 0.0 0.0
69 2021-03-11 15.0 0.0 0.0 0.0
70 2021-03-12 15.0 0.0 0.0 0.0
71 2021-03-13 15.0 0.0 0.0 0.0
72 2021-03-14 15.0 0.0 0.0 0.0
73 2021-03-15 15.0 0.0 0.0 0.0
74 2021-03-16 15.0 0.0 0.0 0.0
75 2021-03-17 15.0 0.0 0.0 0.0
76 2021-03-18 15.0 0.0 0.0 0.0
77 2021-03-19 15.0 0.0 0.0 0.0
78 2021-03-20 15.0 0.0 0.0 0.0
79 2021-03-21 15.0 0.0 0.0 0.0
80 2021-03-22 15.0 0.0 0.0 0.0
81 2021-03-23 15.0 0.0 0.0 0.0
82 2021-03-24 15.0 0.0 0.0 0.0
83 2021-03-25 15.0 0.0 0.0 0.0
84 2021-03-26 15.0 0.0 0.0 0.0
85 2021-03-27 15.0 0.0 0.0 0.0
86 2021-03-28 15.0 0.0 0.0 0.0
87 2021-03-29 15.0 0.0 0.0 0.0
88 2021-03-30 15.0 0.0 0.0 0.0
89 2021-03-31 15.0 0.0 0.0 0.0
90 2021-04-01 20.0 100.0 5.0 100.0
91 2021-04-02 20.0 0.0 0.0 0.0
92 2021-04-03 20.0 0.0 0.0 0.0
93 2021-04-04 20.0 0.0 0.0 0.0
94 2021-04-05 20.0 0.0 0.0 0.0
95 2021-04-06 20.0 0.0 0.0 0.0
96 2021-04-07 20.0 0.0 0.0 0.0
97 2021-04-08 20.0 0.0 0.0 0.0
98 2021-04-09 20.0 0.0 0.0 0.0
99 2021-04-10 20.0 0.0 0.0 0.0
100 2021-04-11 20.0 0.0 0.0 0.0
101 2021-04-12 20.0 0.0 0.0 0.0
102 2021-04-13 20.0 0.0 0.0 0.0
103 2021-04-14 20.0 0.0 0.0 0.0
104 2021-04-15 20.0 0.0 0.0 0.0
105 2021-04-16 20.0 0.0 0.0 0.0
106 2021-04-17 20.0 0.0 0.0 0.0
107 2021-04-18 20.0 0.0 0.0 0.0
108 2021-04-19 20.0 0.0 0.0 0.0
109 2021-04-20 20.0 0.0 0.0 0.0
110 2021-04-21 20.0 0.0 0.0 0.0
111 2021-04-22 20.0 0.0 0.0 0.0
112 2021-04-23 20.0 0.0 0.0 0.0
113 2021-04-24 20.0 0.0 0.0 0.0
114 2021-04-25 20.0 0.0 0.0 0.0
115 2021-04-26 20.0 0.0 0.0 0.0
116 2021-04-27 20.0 0.0 0.0 0.0
117 2021-04-28 20.0 0.0 0.0 0.0
118 2021-04-29 20.0 0.0 0.0 0.0
119 2021-04-30 20.0 0.0 0.0 0.0
假设我有以下代码计算在给定预算的情况下我可以购买多少产品-
import math
import pandas as pd
data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])
df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()
# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)
# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day
# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a
# product, for future purchase. For example, if you observe
# the resulting dataframe at the end of this code, you'll
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was 0. That is because
# on the days leading up to March 1, 2021, I have been saving
# the spare change from previous product purchases and that
# extra money allows me to buy an extra product on March 1, 2021
# despite my budget of 0 should only allow me to purchase
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0
# Suppose I save 0 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0
for index, row in df.iterrows():
if df.loc[index, 'Day'] == 1:
# num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
df.loc[index, 'Num_of_Products_Bought'] = math.floor(
(sum(df.iloc[0:(index + 1)]['Deposit'])
- sum(df.iloc[0:(index + 1)]['Withdrawal']))
/ df.loc[index, 'Current_Price'])
# Record how much I spent buying the product on specific date
df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']
print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.
如代码中的注释中所述,我想知道如何完成相同的操作,而不必逐行迭代并在我的迭代中计算直到当前行的行总和(我阅读 Whosebug 并看到 cumsum()
函数,但我认为 cumsum
没有迭代中当前行的概念。
非常感谢您的 suggestions/answers!
使用.apply
的解决方案:
def fn():
leftover = 0
amount, deposit = yield
while True:
new_amount, new_deposit = yield (deposit + leftover) // amount
leftover = (deposit + leftover) % amount
amount, deposit = new_amount, new_deposit
df = df.set_index("Date")
s = fn()
next(s)
m = df.index.day == 1
df.loc[m, "Deposit"] = 100
df.loc[m, "Num_of_Products_Bought"] = df.loc[
m, ["Current_Price", "Deposit"]
].apply(lambda x: s.send((x["Current_Price"], x["Deposit"])), axis=1)
df.loc[m, "Withdrawal"] = (
df.loc[m, "Num_of_Products_Bought"] * df.loc[m, "Current_Price"]
)
print(df.fillna(0).reset_index())
打印:
Date Current_Price Deposit Num_of_Products_Bought Withdrawal
0 2021-01-01 5.5 100.0 18.0 99.0
1 2021-01-02 5.5 0.0 0.0 0.0
2 2021-01-03 5.5 0.0 0.0 0.0
3 2021-01-04 5.5 0.0 0.0 0.0
4 2021-01-05 5.5 0.0 0.0 0.0
5 2021-01-06 5.5 0.0 0.0 0.0
6 2021-01-07 5.5 0.0 0.0 0.0
7 2021-01-08 5.5 0.0 0.0 0.0
8 2021-01-09 5.5 0.0 0.0 0.0
9 2021-01-10 5.5 0.0 0.0 0.0
10 2021-01-11 5.5 0.0 0.0 0.0
11 2021-01-12 5.5 0.0 0.0 0.0
12 2021-01-13 5.5 0.0 0.0 0.0
13 2021-01-14 5.5 0.0 0.0 0.0
14 2021-01-15 5.5 0.0 0.0 0.0
15 2021-01-16 5.5 0.0 0.0 0.0
16 2021-01-17 5.5 0.0 0.0 0.0
17 2021-01-18 5.5 0.0 0.0 0.0
18 2021-01-19 5.5 0.0 0.0 0.0
19 2021-01-20 5.5 0.0 0.0 0.0
20 2021-01-21 5.5 0.0 0.0 0.0
21 2021-01-22 5.5 0.0 0.0 0.0
22 2021-01-23 5.5 0.0 0.0 0.0
23 2021-01-24 5.5 0.0 0.0 0.0
24 2021-01-25 5.5 0.0 0.0 0.0
25 2021-01-26 5.5 0.0 0.0 0.0
26 2021-01-27 5.5 0.0 0.0 0.0
27 2021-01-28 5.5 0.0 0.0 0.0
28 2021-01-29 5.5 0.0 0.0 0.0
29 2021-01-30 5.5 0.0 0.0 0.0
30 2021-01-31 5.5 0.0 0.0 0.0
31 2021-02-01 10.5 100.0 9.0 94.5
32 2021-02-02 10.5 0.0 0.0 0.0
33 2021-02-03 10.5 0.0 0.0 0.0
34 2021-02-04 10.5 0.0 0.0 0.0
35 2021-02-05 10.5 0.0 0.0 0.0
36 2021-02-06 10.5 0.0 0.0 0.0
37 2021-02-07 10.5 0.0 0.0 0.0
38 2021-02-08 10.5 0.0 0.0 0.0
39 2021-02-09 10.5 0.0 0.0 0.0
40 2021-02-10 10.5 0.0 0.0 0.0
41 2021-02-11 10.5 0.0 0.0 0.0
42 2021-02-12 10.5 0.0 0.0 0.0
43 2021-02-13 10.5 0.0 0.0 0.0
44 2021-02-14 10.5 0.0 0.0 0.0
45 2021-02-15 10.5 0.0 0.0 0.0
46 2021-02-16 10.5 0.0 0.0 0.0
47 2021-02-17 10.5 0.0 0.0 0.0
48 2021-02-18 10.5 0.0 0.0 0.0
49 2021-02-19 10.5 0.0 0.0 0.0
50 2021-02-20 10.5 0.0 0.0 0.0
51 2021-02-21 10.5 0.0 0.0 0.0
52 2021-02-22 10.5 0.0 0.0 0.0
53 2021-02-23 10.5 0.0 0.0 0.0
54 2021-02-24 10.5 0.0 0.0 0.0
55 2021-02-25 10.5 0.0 0.0 0.0
56 2021-02-26 10.5 0.0 0.0 0.0
57 2021-02-27 10.5 0.0 0.0 0.0
58 2021-02-28 10.5 0.0 0.0 0.0
59 2021-03-01 15.0 100.0 7.0 105.0
60 2021-03-02 15.0 0.0 0.0 0.0
61 2021-03-03 15.0 0.0 0.0 0.0
62 2021-03-04 15.0 0.0 0.0 0.0
63 2021-03-05 15.0 0.0 0.0 0.0
64 2021-03-06 15.0 0.0 0.0 0.0
65 2021-03-07 15.0 0.0 0.0 0.0
66 2021-03-08 15.0 0.0 0.0 0.0
67 2021-03-09 15.0 0.0 0.0 0.0
68 2021-03-10 15.0 0.0 0.0 0.0
69 2021-03-11 15.0 0.0 0.0 0.0
70 2021-03-12 15.0 0.0 0.0 0.0
71 2021-03-13 15.0 0.0 0.0 0.0
72 2021-03-14 15.0 0.0 0.0 0.0
73 2021-03-15 15.0 0.0 0.0 0.0
74 2021-03-16 15.0 0.0 0.0 0.0
75 2021-03-17 15.0 0.0 0.0 0.0
76 2021-03-18 15.0 0.0 0.0 0.0
77 2021-03-19 15.0 0.0 0.0 0.0
78 2021-03-20 15.0 0.0 0.0 0.0
79 2021-03-21 15.0 0.0 0.0 0.0
80 2021-03-22 15.0 0.0 0.0 0.0
81 2021-03-23 15.0 0.0 0.0 0.0
82 2021-03-24 15.0 0.0 0.0 0.0
83 2021-03-25 15.0 0.0 0.0 0.0
84 2021-03-26 15.0 0.0 0.0 0.0
85 2021-03-27 15.0 0.0 0.0 0.0
86 2021-03-28 15.0 0.0 0.0 0.0
87 2021-03-29 15.0 0.0 0.0 0.0
88 2021-03-30 15.0 0.0 0.0 0.0
89 2021-03-31 15.0 0.0 0.0 0.0
90 2021-04-01 20.0 100.0 5.0 100.0
91 2021-04-02 20.0 0.0 0.0 0.0
92 2021-04-03 20.0 0.0 0.0 0.0
93 2021-04-04 20.0 0.0 0.0 0.0
94 2021-04-05 20.0 0.0 0.0 0.0
95 2021-04-06 20.0 0.0 0.0 0.0
96 2021-04-07 20.0 0.0 0.0 0.0
97 2021-04-08 20.0 0.0 0.0 0.0
98 2021-04-09 20.0 0.0 0.0 0.0
99 2021-04-10 20.0 0.0 0.0 0.0
100 2021-04-11 20.0 0.0 0.0 0.0
101 2021-04-12 20.0 0.0 0.0 0.0
102 2021-04-13 20.0 0.0 0.0 0.0
103 2021-04-14 20.0 0.0 0.0 0.0
104 2021-04-15 20.0 0.0 0.0 0.0
105 2021-04-16 20.0 0.0 0.0 0.0
106 2021-04-17 20.0 0.0 0.0 0.0
107 2021-04-18 20.0 0.0 0.0 0.0
108 2021-04-19 20.0 0.0 0.0 0.0
109 2021-04-20 20.0 0.0 0.0 0.0
110 2021-04-21 20.0 0.0 0.0 0.0
111 2021-04-22 20.0 0.0 0.0 0.0
112 2021-04-23 20.0 0.0 0.0 0.0
113 2021-04-24 20.0 0.0 0.0 0.0
114 2021-04-25 20.0 0.0 0.0 0.0
115 2021-04-26 20.0 0.0 0.0 0.0
116 2021-04-27 20.0 0.0 0.0 0.0
117 2021-04-28 20.0 0.0 0.0 0.0
118 2021-04-29 20.0 0.0 0.0 0.0
119 2021-04-30 20.0 0.0 0.0 0.0