将数据拆分为 pandas 中的列
Split data into columns in pandas
我有一个 df
name category dummy
USA fx,ft,fe 1
INDIA fx 13
我需要将其转换为
name category_fx categoty_ft category_fe dummy
USA True True True 1
INDIA True False False 13
尝试使用 series.explode() 函数但未获得此输出。
您可以使用 str.get_dummies
and astype(bool)
to convert your strings to new columns of booleans, then add_prefix
to change the column names, and finally join
:
df2 = (df.drop(columns='category)
.join(df['category']
.str.get_dummies(sep=',')
.astype(bool)
.add_prefix('category_')
)
)
或者,对于原始数据帧的修改:
df = df.join(df.pop('category')
.str.get_dummies(sep=',')
.astype(bool)
.add_prefix('category_'))
输出:
name category_fe category_ft category_fx
0 USA True True True
1 INDIA False False True
泛化到更多列
假设输入:
name category1 category2 dummy
0 USA fx,ft,fe a,b,c 1
1 INDIA fx d 13
cats = df.filter(like='category').columns
cols = list(df.columns.difference(cats))
(df
.set_index(cols)
.stack()
.str.get_dummies(sep=',')
.groupby(level=cols).max().astype(bool)
.reset_index()
)
输出:
dummy name a b c d fe ft fx
0 1 USA True True True False True True True
1 13 INDIA False False False True False False True
使用Series.str.get_dummies
by column category
with converting 0,1
to boolean by DataFrame.astype
and DataFrame.add_prefix
:
c = df.columns.difference(['category'], sort=False).tolist()
df = (df.set_index(c)['category']
.str.get_dummies(',')
.astype(bool)
.add_prefix('category_')
.reset_index())
print (df)
name category_fe category_ft category_fx
0 USA True True True
1 INDIA False False True
编辑:如果需要用多列替换一列,您可以使用:
df1 = (df['category']
.str.get_dummies(',')
.astype(bool)
.add_prefix('category_'))
pos = df.columns.get_loc('category')
df = pd.concat([df.iloc[:, :pos], df1, df.iloc[:, pos+1:]], axis=1)
print (df)
name category_fe category_ft category_fx dummy
0 USA True True True 1
1 INDIA False False True 13
此解决方案针对多列进行了修改:
print (df)
name category dummy category1
0 USA fx,ft,fe 1 a,f
1 INDIA fx 13 s,a
cols = ['category','category1']
dfs = [(df[c].str.get_dummies(',').astype(bool).add_prefix(f'{c}_')) for c in cols]
df = pd.concat([df, *dfs], axis=1).drop(cols, axis=1)
print (df)
name dummy category_fe category_ft category_fx category1_a \
0 USA 1 True True True True
1 INDIA 13 False False True True
category1_f category1_s
0 True False
1 False True
我有一个 df
name category dummy
USA fx,ft,fe 1
INDIA fx 13
我需要将其转换为
name category_fx categoty_ft category_fe dummy
USA True True True 1
INDIA True False False 13
尝试使用 series.explode() 函数但未获得此输出。
您可以使用 str.get_dummies
and astype(bool)
to convert your strings to new columns of booleans, then add_prefix
to change the column names, and finally join
:
df2 = (df.drop(columns='category)
.join(df['category']
.str.get_dummies(sep=',')
.astype(bool)
.add_prefix('category_')
)
)
或者,对于原始数据帧的修改:
df = df.join(df.pop('category')
.str.get_dummies(sep=',')
.astype(bool)
.add_prefix('category_'))
输出:
name category_fe category_ft category_fx
0 USA True True True
1 INDIA False False True
泛化到更多列
假设输入:
name category1 category2 dummy
0 USA fx,ft,fe a,b,c 1
1 INDIA fx d 13
cats = df.filter(like='category').columns
cols = list(df.columns.difference(cats))
(df
.set_index(cols)
.stack()
.str.get_dummies(sep=',')
.groupby(level=cols).max().astype(bool)
.reset_index()
)
输出:
dummy name a b c d fe ft fx
0 1 USA True True True False True True True
1 13 INDIA False False False True False False True
使用Series.str.get_dummies
by column category
with converting 0,1
to boolean by DataFrame.astype
and DataFrame.add_prefix
:
c = df.columns.difference(['category'], sort=False).tolist()
df = (df.set_index(c)['category']
.str.get_dummies(',')
.astype(bool)
.add_prefix('category_')
.reset_index())
print (df)
name category_fe category_ft category_fx
0 USA True True True
1 INDIA False False True
编辑:如果需要用多列替换一列,您可以使用:
df1 = (df['category']
.str.get_dummies(',')
.astype(bool)
.add_prefix('category_'))
pos = df.columns.get_loc('category')
df = pd.concat([df.iloc[:, :pos], df1, df.iloc[:, pos+1:]], axis=1)
print (df)
name category_fe category_ft category_fx dummy
0 USA True True True 1
1 INDIA False False True 13
此解决方案针对多列进行了修改:
print (df)
name category dummy category1
0 USA fx,ft,fe 1 a,f
1 INDIA fx 13 s,a
cols = ['category','category1']
dfs = [(df[c].str.get_dummies(',').astype(bool).add_prefix(f'{c}_')) for c in cols]
df = pd.concat([df, *dfs], axis=1).drop(cols, axis=1)
print (df)
name dummy category_fe category_ft category_fx category1_a \
0 USA 1 True True True True
1 INDIA 13 False False True True
category1_f category1_s
0 True False
1 False True