如何将 DataFrame 索引和列与字典键和多个值进行匹配?
How to match DataFrame index and column against dictionary key and multiple values?
我如何修改下面的字典理解以考虑到 s
列也应该是匹配条件?
import pandas as pd
dct = {'NNI' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
'count': [13, 11, 10,12, 16]},
index =['2007-07-13', '2019-09-18', '2016-08-01', '2021-04-05','2017-01-04' ]),
'NVEC' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
'count': [12, 10, 9,14,5]},
index =['2012-10-09', '2018-10-01', '2022-02-01', '2020-03-20','2016-04-06'])
}
df = pd.DataFrame({'Date': ['2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14'],
's': [-1,-1,-1,1,1],
'count': [10, 10, 10, 9, 9]},
index = ['NNI', 'NVEC', 'IPA', 'LYTS', 'MYN'])
df
:
Date s count
NNI 2022-02-14 -1 10
NVEC 2022-02-14 -1 10
IPA 2022-02-14 -1 10
LYTS 2022-02-14 1 9
MYN 2022-02-14 1 9
dct
:
{'NNI': s count
2007-07-13 -1 13
2019-09-18 -1 11
2016-08-01 -1 10
2021-04-05 1 12
2017-01-04 1 16,
'NVEC': s count
2012-10-09 -1 12
2018-10-01 -1 10
2022-02-01 -1 9
2020-03-20 1 14
2016-04-06 1 5}
这是我目前拥有的:
df = df.assign(ratio=pd.Series({k: v['count'].gt(df.loc[k, 'count']).sum() /
v['count'].ge(df.loc[k, 'count']).sum() for k,v in dct.items()})).fillna(0)
df
Date s count ratio
NNI 2022-02-14 -1 10 0.800000
NVEC 2022-02-14 -1 10 0.666667
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
想要的结果是:
df
Date s count ratio
NNI 2022-02-14 -1 10 0.666667
NVEC 2022-02-14 -1 10 0.500000
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
您可以将其添加为布尔掩码,例如:
v.loc[v['s'] == df.loc[k, 's'], 'count']
所以代码变成:
df = df.assign(ratio=pd.Series({k: v.loc[v['s'] == df.loc[k, 's'], 'count'].gt(df.loc[k, 'count']).sum() /
v.loc[v['s'] == df.loc[k, 's'], 'count'].ge(df.loc[k, 'count']).sum()
for k,v in dct.items()})).fillna(0)
输出:
Date s count ratio
NNI 2022-02-14 -1 10 0.666667
NVEC 2022-02-14 -1 10 0.500000
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
只是一个建议,但在这里使用辅助函数可能会有所帮助,因为那里的除法有点不可读,尤其是在添加索引之后。您可以使用:
def get_ratio(df_row, v):
msk = v['s'] == df_row['s']
numerator = v.loc[msk, 'count'].gt(df_row['count']).sum()
denominator = v.loc[msk, 'count'].ge(df_row['count']).sum()
return numerator / denominator
df = df.assign(ratio = pd.Series({k: get_ratio(df.loc[k], v) for k,v in dct.items()})).fillna(0)
我如何修改下面的字典理解以考虑到 s
列也应该是匹配条件?
import pandas as pd
dct = {'NNI' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
'count': [13, 11, 10,12, 16]},
index =['2007-07-13', '2019-09-18', '2016-08-01', '2021-04-05','2017-01-04' ]),
'NVEC' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
'count': [12, 10, 9,14,5]},
index =['2012-10-09', '2018-10-01', '2022-02-01', '2020-03-20','2016-04-06'])
}
df = pd.DataFrame({'Date': ['2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14'],
's': [-1,-1,-1,1,1],
'count': [10, 10, 10, 9, 9]},
index = ['NNI', 'NVEC', 'IPA', 'LYTS', 'MYN'])
df
:
Date s count
NNI 2022-02-14 -1 10
NVEC 2022-02-14 -1 10
IPA 2022-02-14 -1 10
LYTS 2022-02-14 1 9
MYN 2022-02-14 1 9
dct
:
{'NNI': s count
2007-07-13 -1 13
2019-09-18 -1 11
2016-08-01 -1 10
2021-04-05 1 12
2017-01-04 1 16,
'NVEC': s count
2012-10-09 -1 12
2018-10-01 -1 10
2022-02-01 -1 9
2020-03-20 1 14
2016-04-06 1 5}
这是我目前拥有的:
df = df.assign(ratio=pd.Series({k: v['count'].gt(df.loc[k, 'count']).sum() /
v['count'].ge(df.loc[k, 'count']).sum() for k,v in dct.items()})).fillna(0)
df
Date s count ratio
NNI 2022-02-14 -1 10 0.800000
NVEC 2022-02-14 -1 10 0.666667
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
想要的结果是:
df
Date s count ratio
NNI 2022-02-14 -1 10 0.666667
NVEC 2022-02-14 -1 10 0.500000
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
您可以将其添加为布尔掩码,例如:
v.loc[v['s'] == df.loc[k, 's'], 'count']
所以代码变成:
df = df.assign(ratio=pd.Series({k: v.loc[v['s'] == df.loc[k, 's'], 'count'].gt(df.loc[k, 'count']).sum() /
v.loc[v['s'] == df.loc[k, 's'], 'count'].ge(df.loc[k, 'count']).sum()
for k,v in dct.items()})).fillna(0)
输出:
Date s count ratio
NNI 2022-02-14 -1 10 0.666667
NVEC 2022-02-14 -1 10 0.500000
IPA 2022-02-14 -1 10 0.000000
LYTS 2022-02-14 1 9 0.000000
MYN 2022-02-14 1 9 0.000000
只是一个建议,但在这里使用辅助函数可能会有所帮助,因为那里的除法有点不可读,尤其是在添加索引之后。您可以使用:
def get_ratio(df_row, v):
msk = v['s'] == df_row['s']
numerator = v.loc[msk, 'count'].gt(df_row['count']).sum()
denominator = v.loc[msk, 'count'].ge(df_row['count']).sum()
return numerator / denominator
df = df.assign(ratio = pd.Series({k: get_ratio(df.loc[k], v) for k,v in dct.items()})).fillna(0)