如何在条件后打印应用函数内的行号
How to print row number inside apply function after a criteria
我有一个如下所示的数据框
Company,year
T123 Inc Ltd,1990
T124 PVT ltd,1991
T345 Ltd,1990
T789 Pvt.LTd,2001
ABC Limited,1992
ABCDE Ltd,1994
ABC Ltd,1997
ABFE,1987
Tesla ltd,1995
AMAZON Inc,2001
Apple ltd,2003
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
我的数据如下所示
我想执行以下操作
a) 在每个输入键比较结束时打印索引号(i)。for ex: T123 Inc Ltd is compared with ten other strings. So, once it is done and moves to a new/next string/input_key which is T124 PVT ltd, I want the value of i to be incremented by 1 and shown using print function
.
我尝试了以下
def metrics(tup):
print(compare.loc[tup]) # doesn't work. I want the index number
return list(tup)
我不知道如何在 apply
函数中 increment/iterate
我希望打印语句的输出如下所示。打印语句应该在每个输入键的比较结束后才执行。
comparison of 1st input key with 10 strings is done
comparison of 2nd input key with 10 strings is done
更新-代码
ls = []
ks=[]
for i, k in enumerate(compare, 1):
ls.append(metrics(k))
ks.append(k)
print(f'comparison of {i} input key: {k} with {len(ls)} strings is done')
pd.concat(pd.DataFrame(ks),pd.DataFrame(ls)) # error
pd.concat(ks,ls) # error
指标
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup),
fuzz.token_set_ratio(*tup),
fuzz.QRatio(*tup),
fuzz.UQRatio(*tup),
fuzz.UWRatio(*tup)],
['ratio', 'token','set','qr','uqr','uwr'])
让我们尝试枚举 level 0
索引中的唯一值:
groups = []
grouper = compare.groupby(level=0, sort=False)
for i, (k, g) in enumerate(grouper, 1):
# Execute statements here
groups.append(g.apply(metrics))
print(f'comparison of {i} input key: {k} with {len(g)} strings is done')
df_out = pd.concat(groups)
comparison of 1 input key: T123 Inc Ltd with 10 strings is done
comparison of 2 input key: T124 PVT ltd with 10 strings is done
...
comparison of 11 input key: Apple ltd with 10 strings is done
我有一个如下所示的数据框
Company,year
T123 Inc Ltd,1990
T124 PVT ltd,1991
T345 Ltd,1990
T789 Pvt.LTd,2001
ABC Limited,1992
ABCDE Ltd,1994
ABC Ltd,1997
ABFE,1987
Tesla ltd,1995
AMAZON Inc,2001
Apple ltd,2003
compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
我的数据如下所示
我想执行以下操作
a) 在每个输入键比较结束时打印索引号(i)。for ex: T123 Inc Ltd is compared with ten other strings. So, once it is done and moves to a new/next string/input_key which is T124 PVT ltd, I want the value of i to be incremented by 1 and shown using print function
.
我尝试了以下
def metrics(tup):
print(compare.loc[tup]) # doesn't work. I want the index number
return list(tup)
我不知道如何在 apply
函数中 increment/iterate
我希望打印语句的输出如下所示。打印语句应该在每个输入键的比较结束后才执行。
comparison of 1st input key with 10 strings is done
comparison of 2nd input key with 10 strings is done
更新-代码
ls = []
ks=[]
for i, k in enumerate(compare, 1):
ls.append(metrics(k))
ks.append(k)
print(f'comparison of {i} input key: {k} with {len(ls)} strings is done')
pd.concat(pd.DataFrame(ks),pd.DataFrame(ls)) # error
pd.concat(ks,ls) # error
指标
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup),
fuzz.token_set_ratio(*tup),
fuzz.QRatio(*tup),
fuzz.UQRatio(*tup),
fuzz.UWRatio(*tup)],
['ratio', 'token','set','qr','uqr','uwr'])
让我们尝试枚举 level 0
索引中的唯一值:
groups = []
grouper = compare.groupby(level=0, sort=False)
for i, (k, g) in enumerate(grouper, 1):
# Execute statements here
groups.append(g.apply(metrics))
print(f'comparison of {i} input key: {k} with {len(g)} strings is done')
df_out = pd.concat(groups)
comparison of 1 input key: T123 Inc Ltd with 10 strings is done
comparison of 2 input key: T124 PVT ltd with 10 strings is done
...
comparison of 11 input key: Apple ltd with 10 strings is done