扁平化多维tablepythonpandas
Flattening multidimensional table python pandas
我是 Python 和 Pandas 的初学者,我想将多维 table 转换为扁平化。目前看起来如下
Day
Lukas
Steve
BBnr
Comments
BBnr
Comments
1
XXXX1
2PM
XXXX3
9PM
2
XXXX2
5:30PM
XXXX4
7PM
我希望它是这样的:
Day
Seller
BBnr
Comments
1
Lukas
XXXX1
2PM
1
Steve
XXXXX3
9PM
2
Lukas
XXXX2
5:30PM
2
Steve
XXXXX4
7PM
有什么想法吗?到目前为止,我尝试使用 pandas Melt and unstack 但没有成功
这是我当前的代码:
import pandas as pd
df = pd.read_excel('Book1.xlsx', sheet_name="Sheet1", header=[0,1], index_col=[0])
melt = df.melt()
print(melt)
当前输出:
Dag NaN value
0 LUCAS BBnr XXXX1
1 LUCAS BBnr XXXX2
2 LUCAS Comments 2PM
3 LUCAS Comments 5:30PM
4 STEVE BBnr XXXX3
5 STEVE BBnr XXXX4
6 STEVE Comments 9Pm
7 STEVE Comments 7PM
df.head() 融化前:
Dag LUCAS STEVE
BBnr Comments BBnr Comments
1 XXXX1 2PM XXXX3 9Pm
2 XXXX2 5:30PM XXXX4 7PM
一个技巧是 在索引中隐藏 您不想使用 stack
.
处理的列
假设您的数据框是:
df = pd.DataFrame.from_dict({('Day', ''): {0: 1, 1: 2},
('Lukas', 'BBnr'): {0: 'XXXX1', 1: 'XXXX2'},
('Lukas', 'Comments'): {0: '2PM', 1: '5:30PM'},
('Steve', 'BBnr'): {0: 'XXXX3', 1: 'XXXX4'},
('Steve', 'Comments'): {0: '9PM', 1: '7PM'}}
它显示为:
Day Lukas Steve
BBnr Comments BBnr Comments
0 1 XXXX1 2PM XXXX3 9PM
1 2 XXXX2 5:30PM XXXX4 7PM
可以处理:
result = df.set_index('Day').stack(level=0).reset_index()
直接给出:
Day level_1 BBnr Comments
0 1 Lukas XXXX1 2PM
1 1 Steve XXXX3 9PM
2 2 Lukas XXXX2 5:30PM
3 2 Steve XXXX4 7PM
我是 Python 和 Pandas 的初学者,我想将多维 table 转换为扁平化。目前看起来如下
Day | Lukas | Steve | ||
---|---|---|---|---|
BBnr | Comments | BBnr | Comments | |
1 | XXXX1 | 2PM | XXXX3 | 9PM |
2 | XXXX2 | 5:30PM | XXXX4 | 7PM |
我希望它是这样的:
Day | Seller | BBnr | Comments |
---|---|---|---|
1 | Lukas | XXXX1 | 2PM |
1 | Steve | XXXXX3 | 9PM |
2 | Lukas | XXXX2 | 5:30PM |
2 | Steve | XXXXX4 | 7PM |
有什么想法吗?到目前为止,我尝试使用 pandas Melt and unstack 但没有成功
这是我当前的代码:
import pandas as pd
df = pd.read_excel('Book1.xlsx', sheet_name="Sheet1", header=[0,1], index_col=[0])
melt = df.melt()
print(melt)
当前输出:
Dag NaN value
0 LUCAS BBnr XXXX1
1 LUCAS BBnr XXXX2
2 LUCAS Comments 2PM
3 LUCAS Comments 5:30PM
4 STEVE BBnr XXXX3
5 STEVE BBnr XXXX4
6 STEVE Comments 9Pm
7 STEVE Comments 7PM
df.head() 融化前:
Dag LUCAS STEVE
BBnr Comments BBnr Comments
1 XXXX1 2PM XXXX3 9Pm
2 XXXX2 5:30PM XXXX4 7PM
一个技巧是 在索引中隐藏 您不想使用 stack
.
假设您的数据框是:
df = pd.DataFrame.from_dict({('Day', ''): {0: 1, 1: 2},
('Lukas', 'BBnr'): {0: 'XXXX1', 1: 'XXXX2'},
('Lukas', 'Comments'): {0: '2PM', 1: '5:30PM'},
('Steve', 'BBnr'): {0: 'XXXX3', 1: 'XXXX4'},
('Steve', 'Comments'): {0: '9PM', 1: '7PM'}}
它显示为:
Day Lukas Steve
BBnr Comments BBnr Comments
0 1 XXXX1 2PM XXXX3 9PM
1 2 XXXX2 5:30PM XXXX4 7PM
可以处理:
result = df.set_index('Day').stack(level=0).reset_index()
直接给出:
Day level_1 BBnr Comments
0 1 Lukas XXXX1 2PM
1 1 Steve XXXX3 9PM
2 2 Lukas XXXX2 5:30PM
3 2 Steve XXXX4 7PM