基于同一数据框的另一列将缩写应用于数据框的列
Applying abbreviation to the column of a dataframe based on another column of the same dataframe
我在数据框中有两列,其中一列是 class,另一列是描述。在描述中我有一些缩写。我想根据 class 值扩展这些缩写。我有一本以 class 作为键的字典,在值中我有另一本带有缩写及其完整形式的字典。由于这些缩写的意思根据 class 而不同。
例如:- IT 可能表示基于 class 标签的以太信息传输或信息技术。
我尝试了 groupby,但无法将其恢复到原始数据框中。
任何帮助深表感谢。
谢谢
这就是我尝试的方式:
grouped = df.groupby('class')
for n,j in grouped:
j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))
输入数据:
abb = {'IT':{'SQL':'Structured Query Language', 'BLAH': 'blah blah'}, 'Sales':{'SQL':'Sales Qualified Lead'}}
data = [{'class':'IT', 'description':'SQL developer'},
{'class':'IT', 'description':'SQL developer BLAH'},
{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)
class description
0 IT Structured Query Language developer
1 IT Structured Query Language developer blah blah
2 Sales senior Sales Qualified Lead
代码:
df['description'] = (df.groupby('class', as_index=False)
.apply(lambda x: x['description'].str.replace('|'.join(abb[x.name].keys()),
lambda m: abb[x.name][m.group(0)]
)
).reset_index(drop=True)
)
输出:
class description
0 IT Structured Query Language developer
1 IT Structured Query Language developer blah blah
2 Sales senior Sales Qualified Lead
这是一个工作示例,它将行作为输入并在字典中查找 class
值,并将字符串 description
替换为字典中的相应值:
import pandas as pd
abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}}
data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)
def replace_strings(row):
text = row['description']
for key, value in abb[row['class']].items():
text = text.replace(key, value)
return text
df['description'] = df.apply(replace_strings, axis=1)
class
description
0
IT
Structured Query Language developer
1
Sales
senior Sales Qualified Lead
我在数据框中有两列,其中一列是 class,另一列是描述。在描述中我有一些缩写。我想根据 class 值扩展这些缩写。我有一本以 class 作为键的字典,在值中我有另一本带有缩写及其完整形式的字典。由于这些缩写的意思根据 class 而不同。 例如:- IT 可能表示基于 class 标签的以太信息传输或信息技术。
我尝试了 groupby,但无法将其恢复到原始数据框中。 任何帮助深表感谢。 谢谢
这就是我尝试的方式:
grouped = df.groupby('class')
for n,j in grouped:
j['description'].str.split().apply(lambda x: ' '.join([abb[n].get(e, e) for e in x]))
输入数据:
abb = {'IT':{'SQL':'Structured Query Language', 'BLAH': 'blah blah'}, 'Sales':{'SQL':'Sales Qualified Lead'}}
data = [{'class':'IT', 'description':'SQL developer'},
{'class':'IT', 'description':'SQL developer BLAH'},
{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)
class description
0 IT Structured Query Language developer
1 IT Structured Query Language developer blah blah
2 Sales senior Sales Qualified Lead
代码:
df['description'] = (df.groupby('class', as_index=False)
.apply(lambda x: x['description'].str.replace('|'.join(abb[x.name].keys()),
lambda m: abb[x.name][m.group(0)]
)
).reset_index(drop=True)
)
输出:
class description
0 IT Structured Query Language developer
1 IT Structured Query Language developer blah blah
2 Sales senior Sales Qualified Lead
这是一个工作示例,它将行作为输入并在字典中查找 class
值,并将字符串 description
替换为字典中的相应值:
import pandas as pd
abb = {'IT':{'SQL':'Structured Query Language'},'Sales':{'SQL':'Sales Qualified Lead'}}
data = [{'class':'IT', 'description':'SQL developer'},{'class':'Sales', 'description':'senior SQL'}]
df = pd.DataFrame(data)
def replace_strings(row):
text = row['description']
for key, value in abb[row['class']].items():
text = text.replace(key, value)
return text
df['description'] = df.apply(replace_strings, axis=1)
class | description | |
---|---|---|
0 | IT | Structured Query Language developer |
1 | Sales | senior Sales Qualified Lead |