pandas dataframe 如何取消熔化列
pandas dataframe how to unmelt column
我有一个数据框:
df = key1 key2 .. keyn type val1 val2 .. valn
k1 k2 kn p1 1 2 7
k1 k2 kn p2 6 1 5
k1 k2 kn p3 8 4 1
k3 k2 kn p1 4 6 9
k3 k2 kn p2 6 1 0
k3 k2 kn p3 1 2 8
所以,对于每组键 key..keyn
我在列类型中有 3 个值。
我想将它解散到列中,以便每个 <type, val>
配对都有一个列,所以我最终会得到:
df = key1 key2 .. keyn val1_typep1 val1_typep2 val1_typep3 .. valn_typep1 valn_typep2 valn_typep3
k1 k2 kn 1 6 8 7 5 1
k3 k2 kn 4 6 1 9 0 8
最好的方法是什么?
您可以 filter
key
之类的列,然后 pivot
重塑数据框 index
为 keys
和 columns
为 type
,最后使用 map
+ .join
:
展平多索引列
# filter the key like columns
keys = df.filter(like='key').columns
# pivot the dataframe on keys and type
pvt = df.pivot(index=list(keys), columns='type')
# Flatten the multiindex columns
# by joining around seperator _type
pvt.columns = pvt.columns.map('_type'.join)
pvt = pvt.reset_index()
>>> pvt
key1 key2 keyn val1_typep1 val1_typep2 val1_typep3 val2_typep1 val2_typep2 val2_typep3 valn_typep1 valn_typep2 valn_typep3
0 k1 k2 kn 1 6 8 2 1 4 7 5 1
只需使用 pandas 中的 stack and unstack。
根据我的理解,使用 unstack
将内部索引的内容转换为列。
如同使用 stack
将列转换为内部索引的内容。
# sample data
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
# stack the df, with index `['key1', 'key2', 'keyn', 'type']`
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('type')
# print(cols) # ['key1', 'key2', 'keyn', 'type']
dfn = df.set_index(cols).loc[:, 'val1':'valn'].stack().reset_index()
# print(dfn)
# create col_name as `valn_typep1`
dfn['col_name'] = dfn.iloc[:,-2] + '_type' + dfn.iloc[:,-3]
# set index with `['key1', 'key2', 'keyn', 'col_name']`, value is column 0, and unstack, transfer index col_name to columns
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('col_name')
# print(cols) # ['key1', 'key2', 'keyn', 'col_name']
df_result = dfn.set_index(cols)[0].unstack().reset_index()
# print(df_result)
结果:
print(dfn)
key1 key2 keyn type level_4 0
0 k1 k2 kn p1 val1 1
1 k1 k2 kn p1 val2 2
2 k1 k2 kn p1 valn 7
3 k1 k2 kn p2 val1 6
4 k1 k2 kn p2 val2 1
5 k1 k2 kn p2 valn 5
6 k1 k2 kn p3 val1 8
7 k1 k2 kn p3 val2 4
8 k1 k2 kn p3 valn 1
9 k3 k2 kn p1 val1 4
10 k3 k2 kn p1 val2 6
11 k3 k2 kn p1 valn 9
12 k3 k2 kn p2 val1 6
13 k3 k2 kn p2 val2 1
14 k3 k2 kn p2 valn 0
15 k3 k2 kn p3 val1 1
16 k3 k2 kn p3 val2 2
17 k3 k2 kn p3 valn 8
print(df_result)
col_name key1 key2 keyn val1_typep1 val1_typep2 val1_typep3 val2_typep1 \
0 k1 k2 kn 1 6 8 2
1 k3 k2 kn 4 6 1 6
col_name val2_typep2 val2_typep3 valn_typep1 valn_typep2 valn_typep3
0 1 4 7 5 1
1 1 2 9 0 8
对索引字段使用 melt id_vars,对要融化的字段使用 var,然后按索引字段对结果进行排序
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
print(df)
results=pd.melt(df,value_vars=['val1','val2','valn'],id_vars=['key1','key2','keyn','type'])
print(results.sort_values(by=['key1','key2','keyn','type']))
output:
key1 key2 keyn type variable value
0 k1 k2 kn p1 val1 1
6 k1 k2 kn p1 val2 2
12 k1 k2 kn p1 valn 7
1 k1 k2 kn p2 val1 6
7 k1 k2 kn p2 val2 1
13 k1 k2 kn p2 valn 5
2 k1 k2 kn p3 val1 8
8 k1 k2 kn p3 val2 4
14 k1 k2 kn p3 valn 1
3 k3 k2 kn p1 val1 4
9 k3 k2 kn p1 val2 6
15 k3 k2 kn p1 valn 9
4 k3 k2 kn p2 val1 6
10 k3 k2 kn p2 val2 1
16 k3 k2 kn p2 valn 0
5 k3 k2 kn p3 val1 1
11 k3 k2 kn p3 val2 2
17 k3 k2 kn p3 valn 8
#reverse the melt
fp=results.pivot(index=['key1','key2','keyn','type'],columns=
['variable'],values=['value'])
#fp = fp[(fp.T != 0).any()]
print(fp)
value
variable val1 val2 valn
key1 key2 keyn type
k1 k2 kn p1 1 2 7
p2 6 1 5
p3 8 4 1
k3 k2 kn p1 4 6 9
p2 6 1 0
p3 1 2 8
我有一个数据框:
df = key1 key2 .. keyn type val1 val2 .. valn
k1 k2 kn p1 1 2 7
k1 k2 kn p2 6 1 5
k1 k2 kn p3 8 4 1
k3 k2 kn p1 4 6 9
k3 k2 kn p2 6 1 0
k3 k2 kn p3 1 2 8
所以,对于每组键 key..keyn
我在列类型中有 3 个值。
我想将它解散到列中,以便每个 <type, val>
配对都有一个列,所以我最终会得到:
df = key1 key2 .. keyn val1_typep1 val1_typep2 val1_typep3 .. valn_typep1 valn_typep2 valn_typep3
k1 k2 kn 1 6 8 7 5 1
k3 k2 kn 4 6 1 9 0 8
最好的方法是什么?
您可以 filter
key
之类的列,然后 pivot
重塑数据框 index
为 keys
和 columns
为 type
,最后使用 map
+ .join
:
# filter the key like columns
keys = df.filter(like='key').columns
# pivot the dataframe on keys and type
pvt = df.pivot(index=list(keys), columns='type')
# Flatten the multiindex columns
# by joining around seperator _type
pvt.columns = pvt.columns.map('_type'.join)
pvt = pvt.reset_index()
>>> pvt
key1 key2 keyn val1_typep1 val1_typep2 val1_typep3 val2_typep1 val2_typep2 val2_typep3 valn_typep1 valn_typep2 valn_typep3
0 k1 k2 kn 1 6 8 2 1 4 7 5 1
只需使用 pandas 中的 stack and unstack。
根据我的理解,使用 unstack
将内部索引的内容转换为列。
如同使用 stack
将列转换为内部索引的内容。
# sample data
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
# stack the df, with index `['key1', 'key2', 'keyn', 'type']`
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('type')
# print(cols) # ['key1', 'key2', 'keyn', 'type']
dfn = df.set_index(cols).loc[:, 'val1':'valn'].stack().reset_index()
# print(dfn)
# create col_name as `valn_typep1`
dfn['col_name'] = dfn.iloc[:,-2] + '_type' + dfn.iloc[:,-3]
# set index with `['key1', 'key2', 'keyn', 'col_name']`, value is column 0, and unstack, transfer index col_name to columns
cols = df.columns[df.columns.str.startswith('key')].tolist()
cols.append('col_name')
# print(cols) # ['key1', 'key2', 'keyn', 'col_name']
df_result = dfn.set_index(cols)[0].unstack().reset_index()
# print(df_result)
结果:
print(dfn)
key1 key2 keyn type level_4 0
0 k1 k2 kn p1 val1 1
1 k1 k2 kn p1 val2 2
2 k1 k2 kn p1 valn 7
3 k1 k2 kn p2 val1 6
4 k1 k2 kn p2 val2 1
5 k1 k2 kn p2 valn 5
6 k1 k2 kn p3 val1 8
7 k1 k2 kn p3 val2 4
8 k1 k2 kn p3 valn 1
9 k3 k2 kn p1 val1 4
10 k3 k2 kn p1 val2 6
11 k3 k2 kn p1 valn 9
12 k3 k2 kn p2 val1 6
13 k3 k2 kn p2 val2 1
14 k3 k2 kn p2 valn 0
15 k3 k2 kn p3 val1 1
16 k3 k2 kn p3 val2 2
17 k3 k2 kn p3 valn 8
print(df_result)
col_name key1 key2 keyn val1_typep1 val1_typep2 val1_typep3 val2_typep1 \
0 k1 k2 kn 1 6 8 2
1 k3 k2 kn 4 6 1 6
col_name val2_typep2 val2_typep3 valn_typep1 valn_typep2 valn_typep3
0 1 4 7 5 1
1 1 2 9 0 8
对索引字段使用 melt id_vars,对要融化的字段使用 var,然后按索引字段对结果进行排序
data = [{'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 1, 'val2': 2, 'valn': 7}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 5}, {'key1': 'k1', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 8, 'val2': 4, 'valn': 1}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p1', 'val1': 4, 'val2': 6, 'valn': 9}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p2', 'val1': 6, 'val2': 1, 'valn': 0}, {'key1': 'k3', 'key2': 'k2', 'keyn': 'kn', 'type': 'p3', 'val1': 1, 'val2': 2, 'valn': 8}]
df = pd.DataFrame(data)
print(df)
results=pd.melt(df,value_vars=['val1','val2','valn'],id_vars=['key1','key2','keyn','type'])
print(results.sort_values(by=['key1','key2','keyn','type']))
output:
key1 key2 keyn type variable value
0 k1 k2 kn p1 val1 1
6 k1 k2 kn p1 val2 2
12 k1 k2 kn p1 valn 7
1 k1 k2 kn p2 val1 6
7 k1 k2 kn p2 val2 1
13 k1 k2 kn p2 valn 5
2 k1 k2 kn p3 val1 8
8 k1 k2 kn p3 val2 4
14 k1 k2 kn p3 valn 1
3 k3 k2 kn p1 val1 4
9 k3 k2 kn p1 val2 6
15 k3 k2 kn p1 valn 9
4 k3 k2 kn p2 val1 6
10 k3 k2 kn p2 val2 1
16 k3 k2 kn p2 valn 0
5 k3 k2 kn p3 val1 1
11 k3 k2 kn p3 val2 2
17 k3 k2 kn p3 valn 8
#reverse the melt
fp=results.pivot(index=['key1','key2','keyn','type'],columns=
['variable'],values=['value'])
#fp = fp[(fp.T != 0).any()]
print(fp)
value
variable val1 val2 valn
key1 key2 keyn type
k1 k2 kn p1 1 2 7
p2 6 1 5
p3 8 4 1
k3 k2 kn p1 4 6 9
p2 6 1 0
p3 1 2 8