如何 analyze/compare pandas 中的所有行成对组合并保持计数?
How to analyze/compare all rows pairwise combination in pandas and keep count?
我的 df df.shape
(15,4)
和 df.head()
C1 C2 C3 C4
A1 82.0 78.00 1100 3.0
A2 19.0 99.00 9520 3.0
A3 25.0 42.00 1700 7.0
A4 93.0 37.00 1700 7.0
A5 9.2 0.44 510 7.0
我想对 df 中的所有行进行成对比较。 (A1,A2) 之间的比较; (A1,A3); (A1,A4); (A1,A4); (A2,A3); (A2,A4); (A2,A5); (A3,A4).. 依此类推。
比较标准:if(A1,C1 > A2,C1)--> then keep count of A1 as winner and A2 as loser
elif(A1,C1 < A2,C1)--> vice versa A2 has a winner count, A1 gets loser count
elif(A1,C1 == A2,C1)--> Add 0.5 to the count of both A1 and A2 since it is a draw./ or can keep count in a new Draw count column.
将所有这些成对行比较计数存储在新的 df 中。
对于给定的 df.head()
输出应如下所示:
df_new.head()
Wins Loses Draws
A1 7 8 1
A2 6 9 1
A3 8 5 3
A4 9 4 3
A5 2 12 2
到目前为止,我已经能够生成包含所有可能组合的 df:
from itertools import combinations
fd = pd.DataFrame(index = combinations(df.index,2))
fd['Ccom'] = list(combinations(df.C,2))
fd['Wcom'] = list(combinations(df.W,2))
fd['Lcom'] = list(combinations(df.L,2))
fd['Dcom'] = list(combinations(df.D,2))
fd.head()
Ccom Wcom Lcom Dcom
('A1', 'A2') (82.0, 19.0) (78.0, 99.0) (1100.0, 9520.0) (3.0, 3.0)
('A1', 'A3') (82.0, 25.0) (78.0, 42.0) (1100.0, 1700.0) (3.0, 7.0)
('A1', 'A4') (82.0, 93.0) (78.0, 37.0) (1100.0, 1700.0) (3.0, 7.0)
('A1', 'A5') (82.0, 9.2) (78.0, 0.44) (1100.0, 510.0) (3.0, 7.0)
('A2', 'A3') (82.0, 52.0) (78.0, 0.042) (1100.0, 1100.0) (3.0, 17.2)
(值有些变化)..
如何从这些所有可能的组合中提取获胜次数?
虽然这不是很好的解决方案,但它确实有效。我相信有更有效的方法来处理这个问题。
import pandas as pd
# note that this function will compare the row with itself
# thus producing one additional draw
def outcome(x,a):
return (x<a,x==a,x>a)
# placeholder columns, C2 is unused
d = {'C1': [2.0, 1.0, 1.0, 4.0, 9.0, 5.0, 5.0, 3.0, 2.0], 'C2': [1,2,3,4,5,6,7,8,9]}
# named index
ind = ['A'+str(x+1) for x in range(len(d['C1']))]
# construct dataframe
df = pd.DataFrame(data=d, index=ind)
res = []
#iterate over
for index, row in df.iterrows():
# count wins, draws, defeats
res.append((index, df['C1'].apply(lambda x: outcome(x,row['C1']))))
# parse results to obtain clean sums of outcomes
ind = [str(res[j][0]) for j in range(len(res))]
res = [[sum([y[i] for y in res[j][1]]) for i in range(3)] for j in range(len(res))]
# create temporary dataframe
tempDf = pd.DataFrame(res, columns=['W','D','L'], index = ind)
# correct draw counts
tempDf['D']-=1
# merge dataframes
df = df.merge(tempDf, left_index=True, right_index=True)
输出是
C1 C2 W D L
A1 2.0 1 2 1 5
A2 1.0 2 0 1 7
A3 1.0 3 0 1 7
A4 4.0 4 5 0 3
A5 9.0 5 8 0 0
A6 5.0 6 6 1 1
A7 5.0 7 6 1 1
A8 3.0 8 4 0 4
A9 2.0 9 2 1 5
我的 df df.shape
(15,4)
和 df.head()
C1 C2 C3 C4
A1 82.0 78.00 1100 3.0
A2 19.0 99.00 9520 3.0
A3 25.0 42.00 1700 7.0
A4 93.0 37.00 1700 7.0
A5 9.2 0.44 510 7.0
我想对 df 中的所有行进行成对比较。 (A1,A2) 之间的比较; (A1,A3); (A1,A4); (A1,A4); (A2,A3); (A2,A4); (A2,A5); (A3,A4).. 依此类推。
比较标准:if(A1,C1 > A2,C1)--> then keep count of A1 as winner and A2 as loser
elif(A1,C1 < A2,C1)--> vice versa A2 has a winner count, A1 gets loser count
elif(A1,C1 == A2,C1)--> Add 0.5 to the count of both A1 and A2 since it is a draw./ or can keep count in a new Draw count column.
将所有这些成对行比较计数存储在新的 df 中。
对于给定的 df.head()
输出应如下所示:
df_new.head()
Wins Loses Draws
A1 7 8 1
A2 6 9 1
A3 8 5 3
A4 9 4 3
A5 2 12 2
到目前为止,我已经能够生成包含所有可能组合的 df:
from itertools import combinations
fd = pd.DataFrame(index = combinations(df.index,2))
fd['Ccom'] = list(combinations(df.C,2))
fd['Wcom'] = list(combinations(df.W,2))
fd['Lcom'] = list(combinations(df.L,2))
fd['Dcom'] = list(combinations(df.D,2))
fd.head()
Ccom Wcom Lcom Dcom
('A1', 'A2') (82.0, 19.0) (78.0, 99.0) (1100.0, 9520.0) (3.0, 3.0)
('A1', 'A3') (82.0, 25.0) (78.0, 42.0) (1100.0, 1700.0) (3.0, 7.0)
('A1', 'A4') (82.0, 93.0) (78.0, 37.0) (1100.0, 1700.0) (3.0, 7.0)
('A1', 'A5') (82.0, 9.2) (78.0, 0.44) (1100.0, 510.0) (3.0, 7.0)
('A2', 'A3') (82.0, 52.0) (78.0, 0.042) (1100.0, 1100.0) (3.0, 17.2)
(值有些变化).. 如何从这些所有可能的组合中提取获胜次数?
虽然这不是很好的解决方案,但它确实有效。我相信有更有效的方法来处理这个问题。
import pandas as pd
# note that this function will compare the row with itself
# thus producing one additional draw
def outcome(x,a):
return (x<a,x==a,x>a)
# placeholder columns, C2 is unused
d = {'C1': [2.0, 1.0, 1.0, 4.0, 9.0, 5.0, 5.0, 3.0, 2.0], 'C2': [1,2,3,4,5,6,7,8,9]}
# named index
ind = ['A'+str(x+1) for x in range(len(d['C1']))]
# construct dataframe
df = pd.DataFrame(data=d, index=ind)
res = []
#iterate over
for index, row in df.iterrows():
# count wins, draws, defeats
res.append((index, df['C1'].apply(lambda x: outcome(x,row['C1']))))
# parse results to obtain clean sums of outcomes
ind = [str(res[j][0]) for j in range(len(res))]
res = [[sum([y[i] for y in res[j][1]]) for i in range(3)] for j in range(len(res))]
# create temporary dataframe
tempDf = pd.DataFrame(res, columns=['W','D','L'], index = ind)
# correct draw counts
tempDf['D']-=1
# merge dataframes
df = df.merge(tempDf, left_index=True, right_index=True)
输出是
C1 C2 W D L
A1 2.0 1 2 1 5
A2 1.0 2 0 1 7
A3 1.0 3 0 1 7
A4 4.0 4 5 0 3
A5 9.0 5 8 0 0
A6 5.0 6 6 1 1
A7 5.0 7 6 1 1
A8 3.0 8 4 0 4
A9 2.0 9 2 1 5