Pandas拆分成多列并反透视
Pandas Split into multiple columns and unpiviot
我正在尝试根据分隔符拆分多个列,然后对这些列进行逆透视。
df:
Technique_Name Technique_ID Threat_Actor Threat_Tools
0 Abuse Elevation Control Mechanism T1548 NaN NaN
1 Setuid and Setgid T1548.001 NaN NaN
2 Bypass User Account Control T1548.002 Honeybee,BRONZEBUTLER,APT29,CobaltGroup,MuddyW... PowerSploit,pwdump,Mimikatz,netstat,Ping,FTP
3 Sudo and Sudo Caching T1548.003 NaN NaN
4 Elevated Execution with Prompt T1548.004 NaN NaN
我正在使用以下代码拆分列:
Threat_Tools = max(list(map(lambda x: len(x.split(",")),df.Threat_Tools)))
Threat_Actor = max(list(map(lambda x: len(x.split(",")),df.Threat_Actor)))
cols = ["Tools"+str(x) for x in range(Threat_Tools)]
cols1 = ["Actor"+str(x) for x in range(Threat_Actor)]
datalist = list(map(lambda x: x.split(","), df.Threat_Tools))
datalist2 = list(map(lambda x: x.split(","), df.Threat_Actor))
然后我想根据 TechID/Name
取消 Actor 和 Tools 的旋转
期望的输出:
Technique_Name Technique_ID Threat_Actor Threat_Tools
Abuse Elevation Control Mechanism T1548 NaN NaN
Setuid and Setgid T1548.001 NaN NaN
Bypass User Account Control T1548.002 Honeybee PowerSploit
Bypass User Account Control T1548.002 BRONZEBUTLER pwdump
Bypass User Account Control T1548.002 APT29 Mimikatz
Bypass User Account Control T1548.002 CobaltGroup netstat
Bypass User Account Control T1548.002 MuddyW Ping
Sudo and Sudo Caching T1548.003 NaN NaN
Elevated Execution with Prompt T1548.004 NaN NaN
df = df.join(df['Threat_Actor'].str.split(',', expand=True).add_prefix('Actor'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Actor') else x)
df = df.join(df['Threat_Tools'].str.split(',', expand=True).add_prefix('Tools'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Tools') else x)
df = pd.wide_to_long(df, ['Actor', 'Tools'], i=['Technique_ID'], j='i')
df['Tools'] = df['Tools'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()
这允许我拆分列,然后根据所需的输出将它们全部合并到多行中。
我正在尝试根据分隔符拆分多个列,然后对这些列进行逆透视。
df:
Technique_Name Technique_ID Threat_Actor Threat_Tools
0 Abuse Elevation Control Mechanism T1548 NaN NaN
1 Setuid and Setgid T1548.001 NaN NaN
2 Bypass User Account Control T1548.002 Honeybee,BRONZEBUTLER,APT29,CobaltGroup,MuddyW... PowerSploit,pwdump,Mimikatz,netstat,Ping,FTP
3 Sudo and Sudo Caching T1548.003 NaN NaN
4 Elevated Execution with Prompt T1548.004 NaN NaN
我正在使用以下代码拆分列:
Threat_Tools = max(list(map(lambda x: len(x.split(",")),df.Threat_Tools)))
Threat_Actor = max(list(map(lambda x: len(x.split(",")),df.Threat_Actor)))
cols = ["Tools"+str(x) for x in range(Threat_Tools)]
cols1 = ["Actor"+str(x) for x in range(Threat_Actor)]
datalist = list(map(lambda x: x.split(","), df.Threat_Tools))
datalist2 = list(map(lambda x: x.split(","), df.Threat_Actor))
然后我想根据 TechID/Name
取消 Actor 和 Tools 的旋转期望的输出:
Technique_Name Technique_ID Threat_Actor Threat_Tools
Abuse Elevation Control Mechanism T1548 NaN NaN
Setuid and Setgid T1548.001 NaN NaN
Bypass User Account Control T1548.002 Honeybee PowerSploit
Bypass User Account Control T1548.002 BRONZEBUTLER pwdump
Bypass User Account Control T1548.002 APT29 Mimikatz
Bypass User Account Control T1548.002 CobaltGroup netstat
Bypass User Account Control T1548.002 MuddyW Ping
Sudo and Sudo Caching T1548.003 NaN NaN
Elevated Execution with Prompt T1548.004 NaN NaN
df = df.join(df['Threat_Actor'].str.split(',', expand=True).add_prefix('Actor'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Actor') else x)
df = df.join(df['Threat_Tools'].str.split(',', expand=True).add_prefix('Tools'))
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('Tools') else x)
df = pd.wide_to_long(df, ['Actor', 'Tools'], i=['Technique_ID'], j='i')
df['Tools'] = df['Tools'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()
这允许我拆分列,然后根据所需的输出将它们全部合并到多行中。