Pandas拆分成多列并反透视

Pandas Split into multiple columns and unpiviot

我正在尝试根据分隔符拆分多个列,然后对这些列进行逆透视。

df:

                         Technique_Name Technique_ID                                       Threat_Actor                                       Threat_Tools
0     Abuse Elevation Control Mechanism        T1548                                                NaN                                                NaN
1                     Setuid and Setgid    T1548.001                                                NaN                                                NaN
2           Bypass User Account Control    T1548.002  Honeybee,BRONZEBUTLER,APT29,CobaltGroup,MuddyW...                                            PowerSploit,pwdump,Mimikatz,netstat,Ping,FTP
3                 Sudo and Sudo Caching    T1548.003                                                NaN                                                NaN
4        Elevated Execution with Prompt    T1548.004                                                NaN                                                NaN

我正在使用以下代码拆分列:

Threat_Tools = max(list(map(lambda x: len(x.split(",")),df.Threat_Tools)))
Threat_Actor = max(list(map(lambda x: len(x.split(",")),df.Threat_Actor)))
cols = ["Tools"+str(x) for x in range(Threat_Tools)]
cols1 = ["Actor"+str(x) for x in range(Threat_Actor)]
datalist = list(map(lambda x: x.split(","), df.Threat_Tools))
datalist2 = list(map(lambda x: x.split(","), df.Threat_Actor))

然后我想根据 TechID/Name

取消 Actor 和 Tools 的旋转

期望的输出:

                     Technique_Name Technique_ID                                       Threat_Actor                                       Threat_Tools
 Abuse Elevation Control Mechanism        T1548                                                NaN                                                NaN
                 Setuid and Setgid    T1548.001                                                NaN                                                NaN
       Bypass User Account Control    T1548.002                                           Honeybee                                          PowerSploit
       Bypass User Account Control    T1548.002                                       BRONZEBUTLER                                          pwdump
       Bypass User Account Control    T1548.002                                              APT29                                            Mimikatz
       Bypass User Account Control    T1548.002                                        CobaltGroup                                            netstat
       Bypass User Account Control    T1548.002                                               MuddyW                                            Ping
             Sudo and Sudo Caching    T1548.003                                                NaN                                                NaN
    Elevated Execution with Prompt    T1548.004                                                NaN                                                NaN
df = df.join(df['Threat_Actor'].str.split(',', expand=True).add_prefix('Actor'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Actor') else x)
df = df.join(df['Threat_Tools'].str.split(',', expand=True).add_prefix('Tools'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Tools') else x)
df = pd.wide_to_long(df, ['Actor', 'Tools'], i=['Technique_ID'], j='i')
df['Tools'] = df['Tools'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

这允许我拆分列,然后根据所需的输出将它们全部合并到多行中。