使用循环根据特定列值识别数据框中的所有实例
Identify all instances in dataframe based on the specific column value using loop
我有以下数据框..
teamId matchId matchPeriod eventSec eventId eventName
190 8516 5237840 1H 721.2 5 Interruption
191 8516 5237840 1H 723.4 3 Free Kick
192 8516 5237840 1H 725.7 8 Pass
193 8516 5237840 1H 727.2 8 Pass
194 8516 5237840 1H 728.5 10 Shot
这持续了大约 1000 行
我想识别 'Shot' 的所有实例,然后切出该行和前 4 行并创建一个序列,以便我可以处理数据
有人可以帮忙吗?
试试这个代码:
dta # 你的数据帧
index = dta[dta['eventName'] == 'Shot'].index
result = []
for i in range(5):
result = result + list(index - i)
result = set(result)
sub = dta[dta.index.isin(result)]
首先它 select 以值 'Shot'
作为其列 'eventName'
的行的索引。然后我们创建一个集合和迭代操作以获得 selected 行之前的 4 行。
最后,我们select收集我们收集索引的行。
您似乎想切分出现“Shot”的前四行。您可以使用索引值找到“Shot”出现的位置,然后根据索引值对DataFrame进行切片。
将数据添加到数据框:
import pandas as pd
from tabulate import tabulate
dict = {
"teamid": [190,191,192,108,190,190,191,192,108,190,190,191,192,108,190,190,191,192,108,190],
"eventId": [5,2,4,5,6,5,2,4,5,6,5,2,4,5,6,5,2,4,5,6],
"eventname": ['hello','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot']
}
df=pd.DataFrame(data=dict)
print(tabulate(df, headers = 'keys', tablefmt = 'psql'))
然后对数据进行切片并执行您的任务。
# Search for index values where "Shot" appear.
index_values = df[df['eventname'] == 'Shot'].index
# Add -1 at 0 index in index_value list
index_values = index_values.insert(0,-1)
#Slide the data. Over here you can perform your task on last four rows
for i in range(0,len(index_values)-1):
# perform your task here
print(tabulate(df[index_values[i]+1:index_values[i+1]], headers='keys', tablefmt='psql'))
我有以下数据框..
teamId matchId matchPeriod eventSec eventId eventName
190 8516 5237840 1H 721.2 5 Interruption
191 8516 5237840 1H 723.4 3 Free Kick
192 8516 5237840 1H 725.7 8 Pass
193 8516 5237840 1H 727.2 8 Pass
194 8516 5237840 1H 728.5 10 Shot
这持续了大约 1000 行
我想识别 'Shot' 的所有实例,然后切出该行和前 4 行并创建一个序列,以便我可以处理数据
有人可以帮忙吗?
试试这个代码: dta # 你的数据帧
index = dta[dta['eventName'] == 'Shot'].index
result = []
for i in range(5):
result = result + list(index - i)
result = set(result)
sub = dta[dta.index.isin(result)]
首先它 select 以值 'Shot'
作为其列 'eventName'
的行的索引。然后我们创建一个集合和迭代操作以获得 selected 行之前的 4 行。
最后,我们select收集我们收集索引的行。
您似乎想切分出现“Shot”的前四行。您可以使用索引值找到“Shot”出现的位置,然后根据索引值对DataFrame进行切片。
将数据添加到数据框:
import pandas as pd
from tabulate import tabulate
dict = {
"teamid": [190,191,192,108,190,190,191,192,108,190,190,191,192,108,190,190,191,192,108,190],
"eventId": [5,2,4,5,6,5,2,4,5,6,5,2,4,5,6,5,2,4,5,6],
"eventname": ['hello','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot','Interruption','Free Kick','Pass','Pass','Shot']
}
df=pd.DataFrame(data=dict)
print(tabulate(df, headers = 'keys', tablefmt = 'psql'))
然后对数据进行切片并执行您的任务。
# Search for index values where "Shot" appear.
index_values = df[df['eventname'] == 'Shot'].index
# Add -1 at 0 index in index_value list
index_values = index_values.insert(0,-1)
#Slide the data. Over here you can perform your task on last four rows
for i in range(0,len(index_values)-1):
# perform your task here
print(tabulate(df[index_values[i]+1:index_values[i+1]], headers='keys', tablefmt='psql'))