基于同一数据框的另一列将缩写应用于数据框的列

Applying abbreviation to the column of a dataframe based on another column of the same dataframe

我在数据框中有两列,其中一列是 class,另一列是描述。在描述中我有一些缩写。我想根据 class 值扩展这些缩写。我有一本以 class 作为键的字典,在值中我有另一本带有缩写及其完整形式的字典。由于这些缩写的意思根据 class 而不同。 例如:- IT 可能表示基于 class 标签的以太信息传输或信息技术。

我尝试了 groupby,但无法将其恢复到原始数据框中。 任何帮助深表感谢。 谢谢

这就是我尝试的方式:

grouped = df.groupby('class')
for n,j in grouped:
    j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))

输入数据:

abb = {'IT':{'SQL':'Structured Query Language', 'BLAH': 'blah blah'}, 'Sales':{'SQL':'Sales Qualified Lead'}}

data = [{'class':'IT', 'description':'SQL developer'},
        {'class':'IT', 'description':'SQL developer BLAH'},
        {'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)

   class                                    description
0     IT            Structured Query Language developer
1     IT  Structured Query Language developer blah blah
2  Sales                    senior Sales Qualified Lead

代码:

df['description'] = (df.groupby('class', as_index=False)
                     .apply(lambda x: x['description'].str.replace('|'.join(abb[x.name].keys()),
                                                                   lambda m: abb[x.name][m.group(0)]
                                                                  )
                           ).reset_index(drop=True)
                    )

输出:

   class                                    description
0     IT            Structured Query Language developer
1     IT  Structured Query Language developer blah blah
2  Sales                    senior Sales Qualified Lead

这是一个工作示例,它将行作为输入并在字典中查找 class 值,并将字符串 description 替换为字典中的相应值:

import pandas as pd

abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}}

data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)

def replace_strings(row):
    text = row['description']
    for key, value in abb[row['class']].items():
        text = text.replace(key, value)
    return text

df['description'] = df.apply(replace_strings, axis=1)
class description
0 IT Structured Query Language developer
1 Sales senior Sales Qualified Lead