使用循环根据特定列值识别数据框中的所有实例

Question

我有以下数据框..

teamId  matchId matchPeriod eventSec    eventId eventName
190 8516    5237840 1H  721.2   5   Interruption
191 8516    5237840 1H  723.4   3   Free Kick
192 8516    5237840 1H  725.7   8   Pass
193 8516    5237840 1H  727.2   8   Pass
194 8516    5237840 1H  728.5   10  Shot

这持续了大约 1000 行

我想识别 'Shot' 的所有实例，然后切出该行和前 4 行并创建一个序列，以便我可以处理数据

有人可以帮忙吗？

Answer 1

试试这个代码： dta # 你的数据帧

index = dta[dta['eventName'] == 'Shot'].index

result = []
for i in range(5):
    result = result + list(index - i)

result = set(result)

sub = dta[dta.index.isin(result)]

首先它 select 以值 'Shot' 作为其列 'eventName' 的行的索引。然后我们创建一个集合和迭代操作以获得 selected 行之前的 4 行。

最后，我们select收集我们收集索引的行。

Answer 2

您似乎想切分出现“Shot”的前四行。您可以使用索引值找到“Shot”出现的位置，然后根据索引值对DataFrame进行切片。

将数据添加到数据框：

import pandas as pd
from tabulate import tabulate

dict = {
    "teamid": [190,191,192,108,190,190,191,192,108,190,190,191,192,108,190,190,191,192,108,190],
    "eventId": [5,2,4,5,6,5,2,4,5,6,5,2,4,5,6,5,2,4,5,6],
    "eventname": ['hello','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot']
}
df=pd.DataFrame(data=dict)
print(tabulate(df, headers = 'keys', tablefmt = 'psql'))

然后对数据进行切片并执行您的任务。

# Search for index values where "Shot" appear.
index_values = df[df['eventname'] == 'Shot'].index
# Add -1 at 0 index in index_value list
index_values = index_values.insert(0,-1)
#Slide the data. Over here you can perform your task on last four rows
for i in range(0,len(index_values)-1):
    # perform your task here
    print(tabulate(df[index_values[i]+1:index_values[i+1]], headers='keys', tablefmt='psql'))

使用循环根据特定列值识别数据框中的所有实例

Identify all instances in dataframe based on the specific column value using loop

python

iteration

sequence

slice

pandas