从索引到 pandas 数据框中的字段名称

Question

我需要从索引中取回值名称。我的数据集如下

try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'], 
                         'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})

    word    name
0   apple   dog
1   orange  cat
2   diet    mad cat
3   energy  good dog
4   fire    bad dog
5   cake    chicken

使用此功能：

def func(name):
    matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
    return [i for i, x in enumerate(matches) if x]

try_test.apply(lambda row: func(row['name']), axis=1)

我得到以下值：

0    [0, 3, 4]
1       [1, 2]
2       [1, 2]
3       [0, 3]
4       [0, 4]
5          [5]

我想要单词字段而不是索引。

预期输出：

0    [apple, energy, fire]
1       [orange, diet]
2       [orange, diet]
3       [apple, energy]
4       [apple, fire]
5          [cake]

如有任何建议，我们将不胜感激。

Answer 1

获得带索引的 df 后，只需再次索引 df 就可以解决您的问题。这你可以在你的 func 之外或在你的 func 内做，IMO;

In [2]: import pandas as pd                                                                                                                                                                                                                                 

In [3]: try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'],  
   ...:                          'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})                                                                                                                                                      

In [4]: try_test                                                                                                                                                                                                                                            
Out[4]: 
     word      name
0   apple       dog
1  orange       cat
2    diet   mad cat
3  energy  good dog
4    fire   bad dog
5    cake   chicken

In [5]: rows = [0,3,4]                                                                                                                                                                                                                                      

In [6]: try_test.loc[rows, 'word']                                                                                                                                                                                                                          
Out[6]: 
0     apple
3    energy
4      fire
Name: word, dtype: object

In [7]: try_test.loc[rows, 'word'].values.tolist()                                                                                                                                                                                                                  
['apple', 'energy', 'fire']

Answer 2

将函数从 i 更改为 try_test.word[i]

def func(name):
    matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
    return [try_test.word[i] for i, x in enumerate(matches) if x]

从索引到 pandas 数据框中的字段名称

From indices to field name in pandas dataframe

python

pandas

fuzzywuzzy