如何用分类代码替换列中多个逗号分隔的文本词
How to replace multiple comma separated textual words in columns with categorical code
你好,有这样的数据集:
user read
Den Insurance
Den Utility
Mark Power;Bonds;Corporates
Mark Government
Celia Retail
Celia Technology;Paper
Celia Food
我有另一个这样的数据集:
Name Code
Insurance 1
Utility 2
Power 3
Bond 4
Corporates 5
Government 6
Retail 7
Technology 8
Paper 9
Food 10
我想将这些用于数据框并将第一个数据转换为:
user read Code
Den Insurance 1
Den Utility 2
Mark Power;Bonds;Corporates 3,4,5
Mark Government 6
Celia Retail 7
Celia Technology;Paper 8,9
Celia Food 10
如何在 Python Dataframe 中执行此操作?
我在这里使用 unnesting
作为您的第一个数据框,然后我们只需要相应地创建代码列,并且 groupby
agg
df.read=df.read.str.split(';')
df=unnesting(df,['read'])
df['Code']=df.read.map(df1.set_index('Name').Code)
yourdf=df.astype(str).groupby(level=0).agg({'user':'first','read':';'.join,'Code':','.join})
yourdf
Out[255]:
user read Code
0 Den Insurance 1
1 Den Utility 2
2 Mark Power;Bonds;Corporates 3,4,5
3 Mark Government 6
4 Celia Retail 7
5 Celia Technology;Paper 8,9
6 Celia Food 10
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
IIUC
d = df1.set_index('Name').Code.astype(str)
df0.assign(Code=[', '.join(map(d.get, s.split(';'))) for s in df0.read])
user read Code
0 Den Insurance 1
1 Den Utility 2
2 Mark Power;Bonds;Corporates 3, 4, 5
3 Mark Government 6
4 Celia Retail 7
5 Celia Technology;Paper 8, 9
6 Celia Food 10
你好,有这样的数据集:
user read
Den Insurance
Den Utility
Mark Power;Bonds;Corporates
Mark Government
Celia Retail
Celia Technology;Paper
Celia Food
我有另一个这样的数据集:
Name Code
Insurance 1
Utility 2
Power 3
Bond 4
Corporates 5
Government 6
Retail 7
Technology 8
Paper 9
Food 10
我想将这些用于数据框并将第一个数据转换为:
user read Code
Den Insurance 1
Den Utility 2
Mark Power;Bonds;Corporates 3,4,5
Mark Government 6
Celia Retail 7
Celia Technology;Paper 8,9
Celia Food 10
如何在 Python Dataframe 中执行此操作?
我在这里使用 unnesting
作为您的第一个数据框,然后我们只需要相应地创建代码列,并且 groupby
agg
df.read=df.read.str.split(';')
df=unnesting(df,['read'])
df['Code']=df.read.map(df1.set_index('Name').Code)
yourdf=df.astype(str).groupby(level=0).agg({'user':'first','read':';'.join,'Code':','.join})
yourdf
Out[255]:
user read Code
0 Den Insurance 1
1 Den Utility 2
2 Mark Power;Bonds;Corporates 3,4,5
3 Mark Government 6
4 Celia Retail 7
5 Celia Technology;Paper 8,9
6 Celia Food 10
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
IIUC
d = df1.set_index('Name').Code.astype(str)
df0.assign(Code=[', '.join(map(d.get, s.split(';'))) for s in df0.read])
user read Code
0 Den Insurance 1
1 Den Utility 2
2 Mark Power;Bonds;Corporates 3, 4, 5
3 Mark Government 6
4 Celia Retail 7
5 Celia Technology;Paper 8, 9
6 Celia Food 10