如何循环遍历数据框中的行,从一个特定值到 python 中的另一个值?
How to loop through rows in dataframe from one specific value until another in python?
我有以下数据框:
Time
Tab
User
Description
27.10.2021 15:58:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Start edit of part studio feature
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Cancel Operation
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
27.10.2021 16:03:00
Tab Alpha
UserB@gmail.com
Cancel Operation
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Add assembly feature
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO closed by User A
27.10.2021 16:03:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO opened by User A
27.10.2021 16:15:00
Tab Beta
UserA@gmail.com
Start edit of part studio feature
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00
Tab Beta
UserA@gmail.com
Add assembly feature
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Add assembly feature
27.10.2021 16:17:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00
Tab Delta
UserB@gmail.com
Add assembly feature
27.10.2021 16:55:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO closed by User A
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Start edit of part studio feature
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00
Tab Delta
UserC@gmail.com
Add assembly feature
如何根据“描述”列中的条件遍历数据框中的行?条件是在值“Tab 'Tab_name' of type ... opened by User B”和“Tab 'Tab_name' of type 之间提取所述值(选项卡名称) ... 已被用户 B 关闭”到“用户 B”列。
预期输出:
Time
Tab
User
Description
UserB
27.10.2021 15:58:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Start edit of part studio feature
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Cancel Operation
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
Tab Alpha
27.10.2021 16:03:00
Tab Alpha
UserB@gmail.com
Cancel Operation
Tab Alpha
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Add assembly feature
Tab Alpha
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO closed by User A
Tab Alpha
27.10.2021 16:03:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO opened by User A
Tab Alpha
27.10.2021 16:15:00
Tab Beta
UserA@gmail.com
Start edit of part studio feature
Tab Alpha
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
Tab Alpha
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00
Tab Beta
UserA@gmail.com
Add assembly feature
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Add assembly feature
27.10.2021 16:17:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00
Tab Delta
UserB@gmail.com
Add assembly feature
Tab Delta
27.10.2021 16:55:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO closed by User A
Tab Delta
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Start edit of part studio feature
Tab Delta
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00
Tab Beta
UserC@gmail.com
Add assembly feature
这是我的尝试:
df.insert(4, 'User B', '0')
for index, row in df.iterrows():
if row['Description'].find('opened by User B') != -1:
tab = row['Description'].rpartition(' of type')[0]
if row['Description'].find('closed by User B') == -1:
df.at[index, 'User B'] = tab
当前代码从“描述”列不包含“由用户 B 关闭”的每一行中提取选项卡名称。它应该只从问题前面指定的范围内的行中提取 - 在“描述”列中的值==“......由用户B打开”和“......由用户B关闭”的行之间。
我知道不推荐 df.iterrows,但我似乎找不到其他方法。
尝试查找所有“已打开”和“已关闭”的描述并选择它们之间的切片:
opens = df["Description"].str.contains(".*Tab.*of type.*opened by User B")
closes = df["Description"].str.contains(".*Tab.*of type.*closed by User B")
df["UserB"] = df["Tab"].where((opens.cumsum().shift().fillna(0)-closes.cumsum()).astype(bool))
>>> df
Time ... UserB
0 27.10.2021 15:58:00 ... NaN
1 27.10.2021 15:59:00 ... NaN
2 27.10.2021 15:59:00 ... NaN
3 27.10.2021 15:59:00 ... NaN
4 27.10.2021 15:59:00 ... Tab Alpha
5 27.10.2021 16:03:00 ... Tab Alpha
6 27.10.2021 16:03:00 ... Tab Alpha
7 27.10.2021 16:03:00 ... Tab Alpha
8 27.10.2021 16:03:00 ... Tab Beta
9 27.10.2021 16:15:00 ... Tab Beta
10 27.10.2021 16:15:00 ... Tab Alpha
11 27.10.2021 16:15:00 ... NaN
12 27.10.2021 16:17:00 ... NaN
13 27.10.2021 16:17:00 ... NaN
14 27.10.2021 16:17:00 ... NaN
15 27.10.2021 16:17:00 ... NaN
16 27.10.2021 16:54:00 ... Tab Delta
17 27.10.2021 16:55:00 ... Tab Beta
18 27.10.2021 16:55:00 ... Tab Delta
19 27.10.2021 16:55:00 ... NaN
20 27.10.2021 16:59:00 ... NaN
[21 rows x 5 columns]
另一个解决方案:
df["User B"] = df["Description"].str.contains("opened by User B").shift()
df["User B"] += df["Description"].str.contains("closed by User B") * -1
df["tmp"] = (
df["Description"]
.str.extract("(Tab .*?) of type .* opened by User B")
.ffill()
)
df["User B"] = np.where(df["User B"].fillna(0).cumsum(), df["tmp"], "")
df = df.drop(columns="tmp")
print(df.to_markdown())
打印:
Time
Tab
User
Description
User B
0
27.10.2021 15:58:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO opened by User A
1
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Start edit of part studio feature
2
27.10.2021 15:59:00
Tab Alpha
UserA@gmail.com
Cancel Operation
3
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO opened by User B
4
27.10.2021 15:59:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
Tab Alpha
5
27.10.2021 16:03:00
Tab Alpha
UserB@gmail.com
Cancel Operation
Tab Alpha
6
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Add assembly feature
Tab Alpha
7
27.10.2021 16:03:00
Tab Alpha
UserA@gmail.com
Tab Alpha of type PARTSTUDIO closed by User A
Tab Alpha
8
27.10.2021 16:03:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO opened by User A
Tab Alpha
9
27.10.2021 16:15:00
Tab Beta
UserA@gmail.com
Start edit of part studio feature
Tab Alpha
10
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Start edit of part studio feature
Tab Alpha
11
27.10.2021 16:15:00
Tab Alpha
UserB@gmail.com
Tab Alpha of type PARTSTUDIO closed by User B
12
27.10.2021 16:17:00
Tab Beta
UserA@gmail.com
Add assembly feature
13
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Tab Beta of type ASSEMBLY opened by User C
14
27.10.2021 16:17:00
Tab Beta
UserC@gmail.com
Add assembly feature
15
27.10.2021 16:17:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO opened by User B
16
27.10.2021 16:54:00
Tab Delta
UserB@gmail.com
Add assembly feature
Tab Delta
17
27.10.2021 16:55:00
Tab Beta
UserA@gmail.com
Tab Beta of type PARTSTUDIO closed by User A
Tab Delta
18
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Start edit of part studio feature
Tab Delta
19
27.10.2021 16:55:00
Tab Delta
UserB@gmail.com
Tab Delta of type PARTSTUDIO closed by User B
20
27.10.2021 16:59:00
Tab Delta
UserC@gmail.com
Add assembly feature
我有以下数据框:
Time | Tab | User | Description |
---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO opened by User A |
27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Start edit of part studio feature |
27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Cancel Operation |
27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO opened by User B |
27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature |
27.10.2021 16:03:00 | Tab Alpha | UserB@gmail.com | Cancel Operation |
27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Add assembly feature |
27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO closed by User A |
27.10.2021 16:03:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO opened by User A |
27.10.2021 16:15:00 | Tab Beta | UserA@gmail.com | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO closed by User B |
27.10.2021 16:17:00 | Tab Beta | UserA@gmail.com | Add assembly feature |
27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Tab Beta of type ASSEMBLY opened by User C |
27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Add assembly feature |
27.10.2021 16:17:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO opened by User B |
27.10.2021 16:54:00 | Tab Delta | UserB@gmail.com | Add assembly feature |
27.10.2021 16:55:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO closed by User A |
27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Start edit of part studio feature |
27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO closed by User B |
27.10.2021 16:59:00 | Tab Delta | UserC@gmail.com | Add assembly feature |
如何根据“描述”列中的条件遍历数据框中的行?条件是在值“Tab 'Tab_name' of type ... opened by User B”和“Tab 'Tab_name' of type 之间提取所述值(选项卡名称) ... 已被用户 B 关闭”到“用户 B”列。
预期输出:
Time | Tab | User | Description | UserB |
---|---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO opened by User A | |
27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Start edit of part studio feature | |
27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Cancel Operation | |
27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO opened by User B | |
27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | UserB@gmail.com | Cancel Operation | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Add assembly feature | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO closed by User A | Tab Alpha |
27.10.2021 16:03:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO opened by User A | Tab Alpha |
27.10.2021 16:15:00 | Tab Beta | UserA@gmail.com | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO closed by User B | |
27.10.2021 16:17:00 | Tab Beta | UserA@gmail.com | Add assembly feature | |
27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Tab Beta of type ASSEMBLY opened by User C | |
27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Add assembly feature | |
27.10.2021 16:17:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO opened by User B | |
27.10.2021 16:54:00 | Tab Delta | UserB@gmail.com | Add assembly feature | Tab Delta |
27.10.2021 16:55:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO closed by User A | Tab Delta |
27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Start edit of part studio feature | Tab Delta |
27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO closed by User B | |
27.10.2021 16:59:00 | Tab Beta | UserC@gmail.com | Add assembly feature |
这是我的尝试:
df.insert(4, 'User B', '0')
for index, row in df.iterrows():
if row['Description'].find('opened by User B') != -1:
tab = row['Description'].rpartition(' of type')[0]
if row['Description'].find('closed by User B') == -1:
df.at[index, 'User B'] = tab
当前代码从“描述”列不包含“由用户 B 关闭”的每一行中提取选项卡名称。它应该只从问题前面指定的范围内的行中提取 - 在“描述”列中的值==“......由用户B打开”和“......由用户B关闭”的行之间。 我知道不推荐 df.iterrows,但我似乎找不到其他方法。
尝试查找所有“已打开”和“已关闭”的描述并选择它们之间的切片:
opens = df["Description"].str.contains(".*Tab.*of type.*opened by User B")
closes = df["Description"].str.contains(".*Tab.*of type.*closed by User B")
df["UserB"] = df["Tab"].where((opens.cumsum().shift().fillna(0)-closes.cumsum()).astype(bool))
>>> df
Time ... UserB
0 27.10.2021 15:58:00 ... NaN
1 27.10.2021 15:59:00 ... NaN
2 27.10.2021 15:59:00 ... NaN
3 27.10.2021 15:59:00 ... NaN
4 27.10.2021 15:59:00 ... Tab Alpha
5 27.10.2021 16:03:00 ... Tab Alpha
6 27.10.2021 16:03:00 ... Tab Alpha
7 27.10.2021 16:03:00 ... Tab Alpha
8 27.10.2021 16:03:00 ... Tab Beta
9 27.10.2021 16:15:00 ... Tab Beta
10 27.10.2021 16:15:00 ... Tab Alpha
11 27.10.2021 16:15:00 ... NaN
12 27.10.2021 16:17:00 ... NaN
13 27.10.2021 16:17:00 ... NaN
14 27.10.2021 16:17:00 ... NaN
15 27.10.2021 16:17:00 ... NaN
16 27.10.2021 16:54:00 ... Tab Delta
17 27.10.2021 16:55:00 ... Tab Beta
18 27.10.2021 16:55:00 ... Tab Delta
19 27.10.2021 16:55:00 ... NaN
20 27.10.2021 16:59:00 ... NaN
[21 rows x 5 columns]
另一个解决方案:
df["User B"] = df["Description"].str.contains("opened by User B").shift()
df["User B"] += df["Description"].str.contains("closed by User B") * -1
df["tmp"] = (
df["Description"]
.str.extract("(Tab .*?) of type .* opened by User B")
.ffill()
)
df["User B"] = np.where(df["User B"].fillna(0).cumsum(), df["tmp"], "")
df = df.drop(columns="tmp")
print(df.to_markdown())
打印:
Time | Tab | User | Description | User B | |
---|---|---|---|---|---|
0 | 27.10.2021 15:58:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO opened by User A | |
1 | 27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Start edit of part studio feature | |
2 | 27.10.2021 15:59:00 | Tab Alpha | UserA@gmail.com | Cancel Operation | |
3 | 27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO opened by User B | |
4 | 27.10.2021 15:59:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature | Tab Alpha |
5 | 27.10.2021 16:03:00 | Tab Alpha | UserB@gmail.com | Cancel Operation | Tab Alpha |
6 | 27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Add assembly feature | Tab Alpha |
7 | 27.10.2021 16:03:00 | Tab Alpha | UserA@gmail.com | Tab Alpha of type PARTSTUDIO closed by User A | Tab Alpha |
8 | 27.10.2021 16:03:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO opened by User A | Tab Alpha |
9 | 27.10.2021 16:15:00 | Tab Beta | UserA@gmail.com | Start edit of part studio feature | Tab Alpha |
10 | 27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Start edit of part studio feature | Tab Alpha |
11 | 27.10.2021 16:15:00 | Tab Alpha | UserB@gmail.com | Tab Alpha of type PARTSTUDIO closed by User B | |
12 | 27.10.2021 16:17:00 | Tab Beta | UserA@gmail.com | Add assembly feature | |
13 | 27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Tab Beta of type ASSEMBLY opened by User C | |
14 | 27.10.2021 16:17:00 | Tab Beta | UserC@gmail.com | Add assembly feature | |
15 | 27.10.2021 16:17:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO opened by User B | |
16 | 27.10.2021 16:54:00 | Tab Delta | UserB@gmail.com | Add assembly feature | Tab Delta |
17 | 27.10.2021 16:55:00 | Tab Beta | UserA@gmail.com | Tab Beta of type PARTSTUDIO closed by User A | Tab Delta |
18 | 27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Start edit of part studio feature | Tab Delta |
19 | 27.10.2021 16:55:00 | Tab Delta | UserB@gmail.com | Tab Delta of type PARTSTUDIO closed by User B | |
20 | 27.10.2021 16:59:00 | Tab Delta | UserC@gmail.com | Add assembly feature |