如何循环遍历数据框中的行,从一个特定值到 python 中的另一个值?

How to loop through rows in dataframe from one specific value until another in python?

我有以下数据框:

Time Tab User Description
27.10.2021 15:58:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Start edit of part studio feature
27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Cancel Operation
27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Start edit of part studio feature
27.10.2021 16:03:00 Tab Alpha UserB@gmail.com Cancel Operation
27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Add assembly feature
27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO closed by User A
27.10.2021 16:03:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
27.10.2021 16:15:00 Tab Beta UserA@gmail.com Start edit of part studio feature
27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Start edit of part studio feature
27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00 Tab Beta UserA@gmail.com Add assembly feature
27.10.2021 16:17:00 Tab Beta UserC@gmail.com Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00 Tab Beta UserC@gmail.com Add assembly feature
27.10.2021 16:17:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00 Tab Delta UserB@gmail.com Add assembly feature
27.10.2021 16:55:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
27.10.2021 16:55:00 Tab Delta UserB@gmail.com Start edit of part studio feature
27.10.2021 16:55:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00 Tab Delta UserC@gmail.com Add assembly feature

如何根据“描述”列中的条件遍历数据框中的行?条件是在值“Tab 'Tab_name' of type ... opened by User B”和“Tab 'Tab_name' of type 之间提取所述值(选项卡名称) ... 已被用户 B 关闭”到“用户 B”列。

预期输出:

Time Tab User Description UserB
27.10.2021 15:58:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Start edit of part studio feature
27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Cancel Operation
27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Start edit of part studio feature Tab Alpha
27.10.2021 16:03:00 Tab Alpha UserB@gmail.com Cancel Operation Tab Alpha
27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Add assembly feature Tab Alpha
27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO closed by User A Tab Alpha
27.10.2021 16:03:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A Tab Alpha
27.10.2021 16:15:00 Tab Beta UserA@gmail.com Start edit of part studio feature Tab Alpha
27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Start edit of part studio feature Tab Alpha
27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00 Tab Beta UserA@gmail.com Add assembly feature
27.10.2021 16:17:00 Tab Beta UserC@gmail.com Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00 Tab Beta UserC@gmail.com Add assembly feature
27.10.2021 16:17:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00 Tab Delta UserB@gmail.com Add assembly feature Tab Delta
27.10.2021 16:55:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A Tab Delta
27.10.2021 16:55:00 Tab Delta UserB@gmail.com Start edit of part studio feature Tab Delta
27.10.2021 16:55:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00 Tab Beta UserC@gmail.com Add assembly feature

这是我的尝试:

df.insert(4, 'User B', '0')

for index, row in df.iterrows():
    if row['Description'].find('opened by User B') != -1:
        tab = row['Description'].rpartition(' of type')[0]

    if row['Description'].find('closed by User B') == -1:
        df.at[index, 'User B'] = tab

当前代码从“描述”列不包含“由用户 B 关闭”的每一行中提取选项卡名称。它应该只从问题前面指定的范围内的行中提取 - 在“描述”列中的值==“......由用户B打开”和“......由用户B关闭”的行之间。 我知道不推荐 df.iterrows,但我似乎找不到其他方法。

尝试查找所有“已打开”和“已关闭”的描述并选择它们之间的切片:

opens = df["Description"].str.contains(".*Tab.*of type.*opened by User B")
closes = df["Description"].str.contains(".*Tab.*of type.*closed by User B")

df["UserB"] = df["Tab"].where((opens.cumsum().shift().fillna(0)-closes.cumsum()).astype(bool))

>>> df
                   Time  ...      UserB
0   27.10.2021 15:58:00  ...        NaN
1   27.10.2021 15:59:00  ...        NaN
2   27.10.2021 15:59:00  ...        NaN
3   27.10.2021 15:59:00  ...        NaN
4   27.10.2021 15:59:00  ...  Tab Alpha
5   27.10.2021 16:03:00  ...  Tab Alpha
6   27.10.2021 16:03:00  ...  Tab Alpha
7   27.10.2021 16:03:00  ...  Tab Alpha
8   27.10.2021 16:03:00  ...   Tab Beta
9   27.10.2021 16:15:00  ...   Tab Beta
10  27.10.2021 16:15:00  ...  Tab Alpha
11  27.10.2021 16:15:00  ...        NaN
12  27.10.2021 16:17:00  ...        NaN
13  27.10.2021 16:17:00  ...        NaN
14  27.10.2021 16:17:00  ...        NaN
15  27.10.2021 16:17:00  ...        NaN
16  27.10.2021 16:54:00  ...  Tab Delta
17  27.10.2021 16:55:00  ...   Tab Beta
18  27.10.2021 16:55:00  ...  Tab Delta
19  27.10.2021 16:55:00  ...        NaN
20  27.10.2021 16:59:00  ...        NaN

[21 rows x 5 columns]

另一个解决方案:

df["User B"] = df["Description"].str.contains("opened by User B").shift()
df["User B"] += df["Description"].str.contains("closed by User B") * -1

df["tmp"] = (
    df["Description"]
    .str.extract("(Tab .*?) of type .* opened by User B")
    .ffill()
)

df["User B"] = np.where(df["User B"].fillna(0).cumsum(), df["tmp"], "")
df = df.drop(columns="tmp")

print(df.to_markdown())

打印:

Time Tab User Description User B
0 27.10.2021 15:58:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
1 27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Start edit of part studio feature
2 27.10.2021 15:59:00 Tab Alpha UserA@gmail.com Cancel Operation
3 27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
4 27.10.2021 15:59:00 Tab Alpha UserB@gmail.com Start edit of part studio feature Tab Alpha
5 27.10.2021 16:03:00 Tab Alpha UserB@gmail.com Cancel Operation Tab Alpha
6 27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Add assembly feature Tab Alpha
7 27.10.2021 16:03:00 Tab Alpha UserA@gmail.com Tab Alpha of type PARTSTUDIO closed by User A Tab Alpha
8 27.10.2021 16:03:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A Tab Alpha
9 27.10.2021 16:15:00 Tab Beta UserA@gmail.com Start edit of part studio feature Tab Alpha
10 27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Start edit of part studio feature Tab Alpha
11 27.10.2021 16:15:00 Tab Alpha UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
12 27.10.2021 16:17:00 Tab Beta UserA@gmail.com Add assembly feature
13 27.10.2021 16:17:00 Tab Beta UserC@gmail.com Tab Beta of type ASSEMBLY opened by User C
14 27.10.2021 16:17:00 Tab Beta UserC@gmail.com Add assembly feature
15 27.10.2021 16:17:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO opened by User B
16 27.10.2021 16:54:00 Tab Delta UserB@gmail.com Add assembly feature Tab Delta
17 27.10.2021 16:55:00 Tab Beta UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A Tab Delta
18 27.10.2021 16:55:00 Tab Delta UserB@gmail.com Start edit of part studio feature Tab Delta
19 27.10.2021 16:55:00 Tab Delta UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B
20 27.10.2021 16:59:00 Tab Delta UserC@gmail.com Add assembly feature