使用按字长排序的字典翻译熊猫数据框

Question

我已将 excel 导入到 pandas 数据框，我正在尝试对其进行翻译，然后导出回 excel。

举个例子，这是我的数据集：

d = {"cool":"chill", "guy":"dude","cool guy":"bro"}```
data = [['cool guy'], ['cool'], ['guy']]
df = pd.DataFrame(data, columns = ['WORDS'])


print(df)
#    WORDS   
# 0  cool guy   
# 1  cool  
# 2  guy

所以最简单的解决方案是使用 pandas 内置函数 replace。但是，如果您使用：

df['WORDS'] = df['WORDS'].replace(d, regex=True)

结果是：

print(df)
#    WORDS   
# 0  chill dude   
# 1  chill  
# 2  dude

（帅哥没有翻译正确）

这可以通过首先按最长单词对字典进行排序来解决。我尝试使用这个功能：

import re
def replace_words(col, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        col = re.sub(item, dictionary[item], col)
    return col

但是..

df['WORDS'] = replace_words(df['WORDS'], d)

导致类型错误： TypeError: expected string or bytes-like object

尝试将行转换为字符串也无济于事

...*
col = re.sub(item, dictionary[item], [str(row) for row in col])

有没有人有任何解决方案或不同的方法我可以尝试？

Answer 1

df['WORDS'] = df['WORDS'].apply(lambda x: d[x])

这样就可以了。

Answer 2

让我们试试replace

df.WORDS.replace(d)
Out[307]: 
0      bro
1    chill
2     dude
Name: WORDS, dtype: object

使用按字长排序的字典翻译熊猫数据框

translate panda dataframe using dictionary sorted by word length

python

dictionary

translate

pandas