在 pandas 中报告 - 未解决问题的累计总和

Reporting in pandas - cumulative sum of open issues

我有一个工单列表,其中包含以下数据:工单名称、创建日期、状态、关闭日期。

将根据创建/关闭日期计算新列。如果当月创建了状态为 open 的新工单,则新列值将增加 1。工单转为关闭状态的当月,价值将减一。

如何使用上述索引设置 Df,如何在 pandas 中进行累积计算?我特别努力将索引设置为日期的时间序列,并将问题显示在正确的行上 起始数据:

          ID        Created Date Closed Date
0     FND-1974 2021-10-18 00:00:00  2022-03-31
1    FND-10310 2021-10-18 00:00:00  2022-03-31
2    FND-10310 2021-10-18 00:00:00  2022-03-31
3    FND-10310 2021-07-21 00:00:00         NaT
4     FND-9862 2021-07-20 00:00:00  2022-02-28
..         ...                 ...         ...
100         41 2020-04-13 13:34:39         NaT
101         40 2020-04-13 13:32:14         NaT
102         35 2020-04-01 17:48:23         NaT
103         18 2020-01-21 16:08:54         NaT
104          4 2020-02-25 14:56:37         NaT

当前方法:

    df = pd.DataFrame(index= pd.Series(pd.date_range('2021-7-1', dt.date.today(),freq="D")))
    df['ID'] = df_agg['Exception_ID']
    df['Created Date'] = df_agg['Created_On_Date']
    df['Closed Date'] = df_agg['Closed_Date']
    df['count'] = 0
    for index, row in df.iterrows():
        if index >= row['Created Date']:
            row['count'] += 1
        if index >= row['Closed Date']:
            row['count'] -= 1
    print(df.head)

输出:

ID Created Date Closed Date  count
2021-07-01  NaN          NaT         NaT      0
2021-07-02  NaN          NaT         NaT      0
2021-07-03  NaN          NaT         NaT      0
2021-07-04  NaN          NaT         NaT      0
2021-07-05  NaN          NaT         NaT      0
        ...          ...         ...    ...
2022-03-20  NaN          NaT         NaT      0
2022-03-21  NaN          NaT         NaT      0
2022-03-22  NaN          NaT         NaT      0
2022-03-23  NaN          NaT         NaT      0
2022-03-24  NaN          NaT         NaT      0

显然,我想在 ID、创建日期、关闭日期时填充该行,并在特定日期打开问题时将我们的计数加 1。我正在想办法解决这个问题

按此处所述按月计算事件数:

然后使用cumsum计算累计和。

import numpy as np
import pandas as pd

# Prepare some data
dates = np.random.choice(pd.date_range('2020-01-01', '2021-10-31'), size=100)
data = {'ID': [f"FND-{i}" for i in range(100)], 
        'Created Date': dates,
        'Closed Date': dates + pd.to_timedelta(np.random.poisson(60, size=100), unit='D')}
df_agg = pd.DataFrame(data)
# Add some NaT values
df_agg.loc[df_agg['Closed Date'] > '2021-10-31', 'Closed Date'] = None

# Make a dataframe of monthly stats
index = pd.period_range('2021-01', '2021-10', freq='M', name='Month')
monthly_summary = pd.DataFrame(index=index)
monthly_summary['Opened'] = df_agg['ID'].groupby(df_agg['Created Date'].dt.to_period('M')).count()
monthly_summary['Closed'] = df_agg['ID'].groupby(df_agg['Closed Date'].dt.to_period('M')).count()
monthly_summary = monthly_summary.fillna(0).astype(int)
monthly_summary['Net Change'] = monthly_summary['Opened'] - monthly_summary['Closed']

# Calculate cumulative sum of open issues
start_count = 50
monthly_summary['Month-end Count'] = start_count + monthly_summary['Net Change'].cumsum()
print(monthly_summary)
         Opened  Closed  Net Change  Month-end Count
Month                                               
2021-01       2       7          -5               45
2021-02       2       6          -4               41
2021-03      11       2           9               50
2021-04      11       3           8               58
2021-05       6      11          -5               53
2021-06       3      10          -7               46
2021-07       5       5           0               46
2021-08       1       4          -3               43
2021-09       6       5           1               44
2021-10       4       1           3               47

Bills answer 提供了很好的数据表格视图,我建议使用它。

 I was able to get by with the below:
df = pd.DataFrame()
    df['ID'] = df_agg['Exception_ID']
    df['Created Date'] = df_agg['Created_On_Date']
    df['Closed Date'] = df_agg['Closed_Date']
    df = df['Created Date'].groupby([df['Created Date'].dt.month, df['Created Date'].dt.year]).agg('count')
    df = df.to_frame()
    df['date'] = df.index
    dates = df['date']
    date_format = []
    for i in dates:
        value = month_list[i[0]]+', '+str(i[1])
        date_format.append(value)
    df['dates1'] = date_format

输出:

Created Date Created Date                          
1            2020                     1   (1, 2020)
             2021                     2   (1, 2021)
             2022                     3   (1, 2022)
2            2020                     1   (2, 2020)
             2021                     6   (2, 2021)
             2022                     1   (2, 2022)
3            2021                     7   (3, 2021)
4            2020                     9   (4, 2020)
             2021                     3   (4, 2021)
5            2020                     2   (5, 2020)
             2021                     6   (5, 2021)
6            2020                     4   (6, 2020)
             2021                     2   (6, 2021)
7            2020                     3   (7, 2020)
             2021                     8   (7, 2021)
8            2020                     2   (8, 2020)
             2021                     5   (8, 2021)
9            2020                     4   (9, 2020)
10           2020                     6  (10, 2020)
             2021                     8  (10, 2021)
11           2020                     5  (11, 2020)
             2021                    10  (11, 2021)
12           2018                     1  (12, 2018)
             2020                     3  (12, 2020)
             2021                     3  (12, 2021)