python 3 pandas' 数据帧迭代以外的条件和技术

Conditionals and techniques other than iteration for python 3 pandas' dataframes

我正在继续提高我的 pandas 技能,并且 运行 遇到了一些难题。这个问题涉及两个数据帧 df1 和 df2。 df1 包含事件时间和每个事件的相应详细信息。 df2 包含由开始时间和停止时间建立的时间段。

目标:

  1. 按时间段对所有事件进行分组。
  2. 检查计数是否在该时间段内增加以及所有代码在该时间段内是否相同
  3. 在 df2 中创建一个新列,如果第 2 部分中的两个部分都为真则为该列,如果它们不是或在此期间没有事件则为假。

代码:

import pandas as pd
    
df1 = {'Event':  ['2020-12-01 00:10:22', '2020-12-01 00:15:11','2020-12-01 00:18:00',
                  '2020-12-01 00:31:00', '2020-12-01 00:54:00' , '2020-12-01 01:01:00' ,
                  '2020-12-01 01:19:00' , '2020-12-01 01:23:00' , '2020-12-01 01:24:00' ,
                  '2020-12-01 01:56:00' , '2020-12-01 21:02:00', '2020-12-01 02:41:00', 
                  '2020-12-01 02:44:00' , '2020-12-01 03:19:00' ,'2020-12-01 03:22:00' , 
                  '2020-12-01 03:49:00' , '2020-12-01 05:24:00' ,'2020-12-01 05:56:00' , 
                  '2020-12-01 08:02:00'
                   ] , 
       'Count' : [1 , 2 , 4 , 2 , 5 , 
                  3 , 7 , 9 , 10 , 1 , 
                  2 , 5 , 6 , 10 , 5 , 
                  6 ,7 , 8 , 3] ,
      'Code' : ['A' , 'A' , 'A' , 'A' , 'B' , 
                'B' , 'B' , 'B' , 'B' , 'B' , 
                'C' , 'C' , 'C' , 'C' , 'C' , 
                'D' , 'D' , 'D' , 'D']
        }

df1 = pd.DataFrame(df1 , columns = ['Event' , 'Count' , 'Code'])

df1['Event'] = pd.to_datetime(df1['Event'])

df

    Event   Count   Code
0   2020-12-01 00:10:22     1   A
1   2020-12-01 00:15:11     2   A
2   2020-12-01 00:18:00     4   A
3   2020-12-01 00:31:00     2   A
4   2020-12-01 00:54:00     5   B
5   2020-12-01 01:01:00     3   B
6   2020-12-01 01:19:00     7   B
7   2020-12-01 01:23:00     9   B
8   2020-12-01 01:24:00     10  B
9   2020-12-01 01:56:00     1   B
10  2020-12-01 21:02:00     2   C
11  2020-12-01 02:41:00     5   C
12  2020-12-01 02:44:00     6   C
13  2020-12-01 03:19:00     10  C
14  2020-12-01 03:22:00     5   C
15  2020-12-01 03:49:00     6   D
16  2020-12-01 05:24:00     7   D
17  2020-12-01 05:56:00     8   D
18  2020-12-01 08:02:00     3   D

正在创建 df2 代码:

df2 = {'Start Time' : ['2020-12-01 00:00:00', '2020-12-01 00:30:00','2020-12-01 01:30:00',
                    '2020-12-01 02:30:00', '2020-12-01 03:30:00' , '2020-12-01 04:30:00' ,
                    '2020-12-01 05:30:00' , '2020-12-01 07:30:00' , '2020-12-01 10:30:00' ,
                    '2020-12-01 15:00:00' , '2020-12-02 21:00:00'] ,
       'End Time' : ['2020-12-01 00:30:00', '2020-12-01 01:30:00','2020-12-01 02:30:00',
                    '2020-12-01 03:30:00', '2020-12-01 04:30:00' , '2020-12-01 05:30:00' ,
                    '2020-12-01 07:30:00' , '2020-12-01 10:30:00' , '2020-12-01 15:00:00' ,
                    '2020-12-01 21:00:00' , '2020-12-02 00:00:00']
       
        }

df2 = pd.DataFrame(df2 , columns = ['Start Time' , 'End Time'])

df2['Start Time'] = pd.to_datetime(df2['Start Time'])
df2['End Time'] = pd.to_datetime(df2['End Time'])

df2

    Start Time  End Time
0   2020-12-01 00:00:00     2020-12-01 00:30:00
1   2020-12-01 00:30:00     2020-12-01 01:30:00
2   2020-12-01 01:30:00     2020-12-01 02:30:00
3   2020-12-01 02:30:00     2020-12-01 03:30:00
4   2020-12-01 03:30:00     2020-12-01 04:30:00
5   2020-12-01 04:30:00     2020-12-01 05:30:00
6   2020-12-01 05:30:00     2020-12-01 07:30:00
7   2020-12-01 07:30:00     2020-12-01 10:30:00
8   2020-12-01 10:30:00     2020-12-01 15:00:00
9   2020-12-01 15:00:00     2020-12-01 21:00:00
10  2020-12-01 21:00:00     2020-12-02 00:00:00

策略:

我的策略是使用 pd.Dataframe.between_time 然后使用 lambda 函数进行条件语句检查,但我似乎无法让它工作。

以下是所需的输出:

    Start Time  End Time    Test
0   2020-12-01 00:00:00     2020-12-01 00:30:00     True
1   2020-12-01 00:30:00     2020-12-01 01:30:00     False
2   2020-12-01 01:30:00     2020-12-01 02:30:00     True
3   2020-12-01 02:30:00     2020-12-01 03:30:00     False
4   2020-12-01 03:30:00     2020-12-01 04:30:00     True
5   2020-12-01 04:30:00     2020-12-01 05:30:00     True
6   2020-12-01 05:30:00     2020-12-01 07:30:00     True
7   2020-12-01 07:30:00     2020-12-01 10:30:00     True
8   2020-12-01 10:30:00     2020-12-01 15:00:00     False
9   2020-12-01 15:00:00     2020-12-01 21:00:00     False
10  2020-12-02 21:00:00     2020-12-02 00:00:00     False

您可以使用根据指定条件生成布尔值的自定义函数。这里我们使用 Series.is_monotonic_increasing to check if Count in increasing and Series.nunique 检查给定时间段内所有代码是否相同:

def f():
    for x, y in df2.to_numpy():
        s = df1[df1['Event'].between(x, y)]
        yield s['Count'].is_monotonic_increasing & s['Code'].nunique() == 1

df2['Test'] = list(f())

结果:

            Start Time            End Time   Test
0  2020-12-01 00:00:00 2020-12-01 00:30:00   True
1  2020-12-01 00:30:00 2020-12-01 01:30:00  False
2  2020-12-01 01:30:00 2020-12-01 02:30:00   True
3  2020-12-01 02:30:00 2020-12-01 03:30:00  False
4  2020-12-01 03:30:00 2020-12-01 04:30:00   True
5  2020-12-01 04:30:00 2020-12-01 05:30:00   True
6  2020-12-01 05:30:00 2020-12-01 07:30:00   True
7  2020-12-01 07:30:00 2020-12-01 10:30:00   True
8  2020-12-01 10:30:00 2020-12-01 15:00:00  False
9  2020-12-01 15:00:00 2020-12-01 21:00:00  False
10 2020-12-02 21:00:00 2020-12-02 00:00:00  False