如何用分类代码替换列中多个逗号分隔的文本词

How to replace multiple comma separated textual words in columns with categorical code

你好,有这样的数据集:

user     read
Den    Insurance
Den    Utility
Mark   Power;Bonds;Corporates
Mark   Government
Celia  Retail
Celia  Technology;Paper
Celia  Food

我有另一个这样的数据集:

Name            Code
Insurance        1
Utility          2
Power            3
Bond             4
Corporates       5
Government       6
Retail           7
Technology       8
Paper            9
Food             10

我想将这些用于数据框并将第一个数据转换为:

user     read                  Code
Den    Insurance                1
Den    Utility                  2
Mark   Power;Bonds;Corporates  3,4,5
Mark   Government               6
Celia  Retail                   7
Celia  Technology;Paper        8,9
Celia  Food                     10

如何在 Python Dataframe 中执行此操作?

我在这里使用 unnesting 作为您的第一个数据框,然后我们只需要相应地创建代码列,并且 groupby agg

df.read=df.read.str.split(';')
df=unnesting(df,['read'])
df['Code']=df.read.map(df1.set_index('Name').Code)
yourdf=df.astype(str).groupby(level=0).agg({'user':'first','read':';'.join,'Code':','.join})
yourdf
Out[255]: 
    user                    read   Code
0    Den               Insurance      1
1    Den                 Utility      2
2   Mark  Power;Bonds;Corporates  3,4,5
3   Mark              Government      6
4  Celia                  Retail      7
5  Celia        Technology;Paper    8,9
6  Celia                    Food     10

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

IIUC

d = df1.set_index('Name').Code.astype(str)
df0.assign(Code=[', '.join(map(d.get, s.split(';'))) for s in df0.read])

    user                    read     Code
0    Den               Insurance        1
1    Den                 Utility        2
2   Mark  Power;Bonds;Corporates  3, 4, 5
3   Mark              Government        6
4  Celia                  Retail        7
5  Celia        Technology;Paper     8, 9
6  Celia                    Food       10