在 pandas 中报告 - 未解决问题的累计总和
Reporting in pandas - cumulative sum of open issues
我有一个工单列表,其中包含以下数据:工单名称、创建日期、状态、关闭日期。
将根据创建/关闭日期计算新列。如果当月创建了状态为 open 的新工单,则新列值将增加 1。工单转为关闭状态的当月,价值将减一。
如何使用上述索引设置 Df,如何在 pandas 中进行累积计算?我特别努力将索引设置为日期的时间序列,并将问题显示在正确的行上
起始数据:
ID Created Date Closed Date
0 FND-1974 2021-10-18 00:00:00 2022-03-31
1 FND-10310 2021-10-18 00:00:00 2022-03-31
2 FND-10310 2021-10-18 00:00:00 2022-03-31
3 FND-10310 2021-07-21 00:00:00 NaT
4 FND-9862 2021-07-20 00:00:00 2022-02-28
.. ... ... ...
100 41 2020-04-13 13:34:39 NaT
101 40 2020-04-13 13:32:14 NaT
102 35 2020-04-01 17:48:23 NaT
103 18 2020-01-21 16:08:54 NaT
104 4 2020-02-25 14:56:37 NaT
当前方法:
df = pd.DataFrame(index= pd.Series(pd.date_range('2021-7-1', dt.date.today(),freq="D")))
df['ID'] = df_agg['Exception_ID']
df['Created Date'] = df_agg['Created_On_Date']
df['Closed Date'] = df_agg['Closed_Date']
df['count'] = 0
for index, row in df.iterrows():
if index >= row['Created Date']:
row['count'] += 1
if index >= row['Closed Date']:
row['count'] -= 1
print(df.head)
输出:
ID Created Date Closed Date count
2021-07-01 NaN NaT NaT 0
2021-07-02 NaN NaT NaT 0
2021-07-03 NaN NaT NaT 0
2021-07-04 NaN NaT NaT 0
2021-07-05 NaN NaT NaT 0
... ... ... ...
2022-03-20 NaN NaT NaT 0
2022-03-21 NaN NaT NaT 0
2022-03-22 NaN NaT NaT 0
2022-03-23 NaN NaT NaT 0
2022-03-24 NaN NaT NaT 0
显然,我想在 ID、创建日期、关闭日期时填充该行,并在特定日期打开问题时将我们的计数加 1。我正在想办法解决这个问题
按此处所述按月计算事件数:
然后使用cumsum计算累计和。
import numpy as np
import pandas as pd
# Prepare some data
dates = np.random.choice(pd.date_range('2020-01-01', '2021-10-31'), size=100)
data = {'ID': [f"FND-{i}" for i in range(100)],
'Created Date': dates,
'Closed Date': dates + pd.to_timedelta(np.random.poisson(60, size=100), unit='D')}
df_agg = pd.DataFrame(data)
# Add some NaT values
df_agg.loc[df_agg['Closed Date'] > '2021-10-31', 'Closed Date'] = None
# Make a dataframe of monthly stats
index = pd.period_range('2021-01', '2021-10', freq='M', name='Month')
monthly_summary = pd.DataFrame(index=index)
monthly_summary['Opened'] = df_agg['ID'].groupby(df_agg['Created Date'].dt.to_period('M')).count()
monthly_summary['Closed'] = df_agg['ID'].groupby(df_agg['Closed Date'].dt.to_period('M')).count()
monthly_summary = monthly_summary.fillna(0).astype(int)
monthly_summary['Net Change'] = monthly_summary['Opened'] - monthly_summary['Closed']
# Calculate cumulative sum of open issues
start_count = 50
monthly_summary['Month-end Count'] = start_count + monthly_summary['Net Change'].cumsum()
print(monthly_summary)
Opened Closed Net Change Month-end Count
Month
2021-01 2 7 -5 45
2021-02 2 6 -4 41
2021-03 11 2 9 50
2021-04 11 3 8 58
2021-05 6 11 -5 53
2021-06 3 10 -7 46
2021-07 5 5 0 46
2021-08 1 4 -3 43
2021-09 6 5 1 44
2021-10 4 1 3 47
Bills answer 提供了很好的数据表格视图,我建议使用它。
I was able to get by with the below:
df = pd.DataFrame()
df['ID'] = df_agg['Exception_ID']
df['Created Date'] = df_agg['Created_On_Date']
df['Closed Date'] = df_agg['Closed_Date']
df = df['Created Date'].groupby([df['Created Date'].dt.month, df['Created Date'].dt.year]).agg('count')
df = df.to_frame()
df['date'] = df.index
dates = df['date']
date_format = []
for i in dates:
value = month_list[i[0]]+', '+str(i[1])
date_format.append(value)
df['dates1'] = date_format
输出:
Created Date Created Date
1 2020 1 (1, 2020)
2021 2 (1, 2021)
2022 3 (1, 2022)
2 2020 1 (2, 2020)
2021 6 (2, 2021)
2022 1 (2, 2022)
3 2021 7 (3, 2021)
4 2020 9 (4, 2020)
2021 3 (4, 2021)
5 2020 2 (5, 2020)
2021 6 (5, 2021)
6 2020 4 (6, 2020)
2021 2 (6, 2021)
7 2020 3 (7, 2020)
2021 8 (7, 2021)
8 2020 2 (8, 2020)
2021 5 (8, 2021)
9 2020 4 (9, 2020)
10 2020 6 (10, 2020)
2021 8 (10, 2021)
11 2020 5 (11, 2020)
2021 10 (11, 2021)
12 2018 1 (12, 2018)
2020 3 (12, 2020)
2021 3 (12, 2021)
我有一个工单列表,其中包含以下数据:工单名称、创建日期、状态、关闭日期。
将根据创建/关闭日期计算新列。如果当月创建了状态为 open 的新工单,则新列值将增加 1。工单转为关闭状态的当月,价值将减一。
如何使用上述索引设置 Df,如何在 pandas 中进行累积计算?我特别努力将索引设置为日期的时间序列,并将问题显示在正确的行上 起始数据:
ID Created Date Closed Date
0 FND-1974 2021-10-18 00:00:00 2022-03-31
1 FND-10310 2021-10-18 00:00:00 2022-03-31
2 FND-10310 2021-10-18 00:00:00 2022-03-31
3 FND-10310 2021-07-21 00:00:00 NaT
4 FND-9862 2021-07-20 00:00:00 2022-02-28
.. ... ... ...
100 41 2020-04-13 13:34:39 NaT
101 40 2020-04-13 13:32:14 NaT
102 35 2020-04-01 17:48:23 NaT
103 18 2020-01-21 16:08:54 NaT
104 4 2020-02-25 14:56:37 NaT
当前方法:
df = pd.DataFrame(index= pd.Series(pd.date_range('2021-7-1', dt.date.today(),freq="D")))
df['ID'] = df_agg['Exception_ID']
df['Created Date'] = df_agg['Created_On_Date']
df['Closed Date'] = df_agg['Closed_Date']
df['count'] = 0
for index, row in df.iterrows():
if index >= row['Created Date']:
row['count'] += 1
if index >= row['Closed Date']:
row['count'] -= 1
print(df.head)
输出:
ID Created Date Closed Date count
2021-07-01 NaN NaT NaT 0
2021-07-02 NaN NaT NaT 0
2021-07-03 NaN NaT NaT 0
2021-07-04 NaN NaT NaT 0
2021-07-05 NaN NaT NaT 0
... ... ... ...
2022-03-20 NaN NaT NaT 0
2022-03-21 NaN NaT NaT 0
2022-03-22 NaN NaT NaT 0
2022-03-23 NaN NaT NaT 0
2022-03-24 NaN NaT NaT 0
显然,我想在 ID、创建日期、关闭日期时填充该行,并在特定日期打开问题时将我们的计数加 1。我正在想办法解决这个问题
按此处所述按月计算事件数:
然后使用cumsum计算累计和。
import numpy as np
import pandas as pd
# Prepare some data
dates = np.random.choice(pd.date_range('2020-01-01', '2021-10-31'), size=100)
data = {'ID': [f"FND-{i}" for i in range(100)],
'Created Date': dates,
'Closed Date': dates + pd.to_timedelta(np.random.poisson(60, size=100), unit='D')}
df_agg = pd.DataFrame(data)
# Add some NaT values
df_agg.loc[df_agg['Closed Date'] > '2021-10-31', 'Closed Date'] = None
# Make a dataframe of monthly stats
index = pd.period_range('2021-01', '2021-10', freq='M', name='Month')
monthly_summary = pd.DataFrame(index=index)
monthly_summary['Opened'] = df_agg['ID'].groupby(df_agg['Created Date'].dt.to_period('M')).count()
monthly_summary['Closed'] = df_agg['ID'].groupby(df_agg['Closed Date'].dt.to_period('M')).count()
monthly_summary = monthly_summary.fillna(0).astype(int)
monthly_summary['Net Change'] = monthly_summary['Opened'] - monthly_summary['Closed']
# Calculate cumulative sum of open issues
start_count = 50
monthly_summary['Month-end Count'] = start_count + monthly_summary['Net Change'].cumsum()
print(monthly_summary)
Opened Closed Net Change Month-end Count
Month
2021-01 2 7 -5 45
2021-02 2 6 -4 41
2021-03 11 2 9 50
2021-04 11 3 8 58
2021-05 6 11 -5 53
2021-06 3 10 -7 46
2021-07 5 5 0 46
2021-08 1 4 -3 43
2021-09 6 5 1 44
2021-10 4 1 3 47
Bills answer 提供了很好的数据表格视图,我建议使用它。
I was able to get by with the below:
df = pd.DataFrame()
df['ID'] = df_agg['Exception_ID']
df['Created Date'] = df_agg['Created_On_Date']
df['Closed Date'] = df_agg['Closed_Date']
df = df['Created Date'].groupby([df['Created Date'].dt.month, df['Created Date'].dt.year]).agg('count')
df = df.to_frame()
df['date'] = df.index
dates = df['date']
date_format = []
for i in dates:
value = month_list[i[0]]+', '+str(i[1])
date_format.append(value)
df['dates1'] = date_format
输出:
Created Date Created Date
1 2020 1 (1, 2020)
2021 2 (1, 2021)
2022 3 (1, 2022)
2 2020 1 (2, 2020)
2021 6 (2, 2021)
2022 1 (2, 2022)
3 2021 7 (3, 2021)
4 2020 9 (4, 2020)
2021 3 (4, 2021)
5 2020 2 (5, 2020)
2021 6 (5, 2021)
6 2020 4 (6, 2020)
2021 2 (6, 2021)
7 2020 3 (7, 2020)
2021 8 (7, 2021)
8 2020 2 (8, 2020)
2021 5 (8, 2021)
9 2020 4 (9, 2020)
10 2020 6 (10, 2020)
2021 8 (10, 2021)
11 2020 5 (11, 2020)
2021 10 (11, 2021)
12 2018 1 (12, 2018)
2020 3 (12, 2020)
2021 3 (12, 2021)