Python Pandas: 如何分组和比较列

Question

这是我的数据农场 'df':

match           name                   group  
adamant         Adamant Home Network   86   
adamant         ADAMANT, Ltd.          86   
adamant bild    TOV Adamant-Bild       86   
360works        360WORKS               94   
360works        360works.com           94

每个组号，我想一个一个地比较名称，看看它们是否与 'match' 列中的同一个词匹配。

所以期望的输出将是计数：

 If they match we count it as 'TP' and if not we count it as 'FN'.

我想计算每个组号的匹配词数量，但这对我想要的完全没有帮助：

df.groupby(group).count()

有没有人知道如何去做？

Answer 1

如果我理解你的问题，这应该可以解决问题：

import re
import pandas


df = pandas.DataFrame([['adamant', 'Adamant Home Network', 86], ['adamant', 'ADAMANT, Ltd.', 86],
                       ['adamant bild', "TOV Adamant-Bild", 86], ['360works', '360WORKS', 94],
                       ['360works ', "360works.com ", 94]], columns=['match', 'name', 'group'])


def my_function(group):
    for i, row in group.iterrows():
        if ''.join(re.findall("[a-zA-Z]+", row['match'])).lower() not in ''.join(
                re.findall("[a-zA-Z]+", row['name'])).lower():
            # parsing the names in each columns and looking for an inclusion
            # if one of the inclusion fails, we return 'FN'
            return 'FN'
    # if all inclusions succeed, we return 'TP'
    return 'TP'


res_series = df.groupby('group').apply(my_function)
res_series.name = 'count'
res_df = res_series.reset_index()
print res_df

这会给你这个 DataFrame：

     group     count
1    86        'TP'
2    94        'TP'

Answer 2

此函数将为每个提供的组逐行比较名称和匹配列：

def apply_func(df):
    x = df['name'] == df['match']
    return x.map({False:'FIN', True:'TP'})

In [683]: temp.join(temp.groupby('group').apply(apply_func).reset_index(), rsuffix='_1', how='left')
Out[683]: 
           match                  name  group  group_1  level_1    0
0        adamant  Adamant Home Network     86       86        0  FIN
1        adamant         ADAMANT, Ltd.     86       86        1  FIN
2  adamant bild       TOV Adamant-Bild     86       86        2  FIN
3       360works              360WORKS     94       94        3  FIN
4       360works          360works.com     94       94        4  FIN

Python Pandas: 如何分组和比较列

Python Pandas: How to groupby and compare columns

python

comparison

group-by

pandas