如何旋转 pandas 数据框(有两个 headers)
How to pivot pandas dataframe (with two headers)
我有一个 Pandas 数据框,如下所示:
Age
USA
USA
USA
UK
UK
UK
Age
male
female
total
male
female
total
2-year-old
2
3
5
1
1
2
3-year-old
8
8
16
7
9
16
事实上我有两行headers(美国+男性;美国+女性;..)
CSV-File (test.csv):
;USA;USA;USA;UK;UK;UK
Age;male;female;total;male;female;total
2-year-old;2;3;5;1;1;2
3-year-old;8;8;16;7;9;16
我的python-code:
df = pd.read_csv('test.csv',
delimiter=";",
header=[0,1])
df = df.rename(columns={'Unnamed: 0_level_0': 'Age'})
如何旋转 pandas 数据框以获得以下结果?
Age
Country
Gender
frequency
2-year-old
USA
male
2
2-year-old
USA
female
3
2-year-old
UK
male
1
2-year-old
UK
female
1
3-year-old
USA
male
8
3-year-old
USA
female
8
3-year-old
UK
male
7
3-year-old
UK
female
9
编辑:
开始 Table:
Kode
Country
Procedure
male
male
female
female
Kode
Country
Procedure
two-year-old
three-year-old
two-year-old
three-year-old
1a
US
proc_1
4
6
3
6
1a
UK
proc_1
2
3
5
1
1b
US
proc_2
15
3
5
2
1b
UK
proc_2
8
4
7
3
CSV:
Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3
结果Table:
Code
Country
Procedure
Gender
Age
Frequency
1a
US
proc_1
male
two-year-old
4
1a
US
proc_1
male
three-year-old
6
1a
US
proc_1
female
two-year-old
3
1a
US
proc_1
female
three-year-old
6
1a
UK
proc_1
male
two-year-old
2
1a
UK
proc_1
male
three-year-old
3
1a
UK
proc_1
female
two-year-old
5
1a
UK
proc_1
female
three-year-old
1
1b
...
使用DataFrame.set_index
with DataFrame.stack
,如果需要删除总计添加drop
:
df = (df.drop('total', axis=1, level=1)
.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old USA female 3
3 2-year-old USA male 2
4 3-year-old UK female 9
5 3-year-old UK male 7
6 3-year-old USA female 8
7 3-year-old USA male 8
或者:
df = (df.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old UK total 2
3 2-year-old USA female 3
4 2-year-old USA male 2
5 2-year-old USA total 5
6 3-year-old UK female 9
7 3-year-old UK male 7
8 3-year-old UK total 16
9 3-year-old USA female 8
10 3-year-old USA male 8
11 3-year-old USA total 16
我有一个 Pandas 数据框,如下所示:
Age | USA | USA | USA | UK | UK | UK |
---|---|---|---|---|---|---|
Age | male | female | total | male | female | total |
2-year-old | 2 | 3 | 5 | 1 | 1 | 2 |
3-year-old | 8 | 8 | 16 | 7 | 9 | 16 |
事实上我有两行headers(美国+男性;美国+女性;..)
CSV-File (test.csv):
;USA;USA;USA;UK;UK;UK
Age;male;female;total;male;female;total
2-year-old;2;3;5;1;1;2
3-year-old;8;8;16;7;9;16
我的python-code:
df = pd.read_csv('test.csv',
delimiter=";",
header=[0,1])
df = df.rename(columns={'Unnamed: 0_level_0': 'Age'})
如何旋转 pandas 数据框以获得以下结果?
Age | Country | Gender | frequency |
---|---|---|---|
2-year-old | USA | male | 2 |
2-year-old | USA | female | 3 |
2-year-old | UK | male | 1 |
2-year-old | UK | female | 1 |
3-year-old | USA | male | 8 |
3-year-old | USA | female | 8 |
3-year-old | UK | male | 7 |
3-year-old | UK | female | 9 |
编辑:
开始 Table:
Kode | Country | Procedure | male | male | female | female |
---|---|---|---|---|---|---|
Kode | Country | Procedure | two-year-old | three-year-old | two-year-old | three-year-old |
1a | US | proc_1 | 4 | 6 | 3 | 6 |
1a | UK | proc_1 | 2 | 3 | 5 | 1 |
1b | US | proc_2 | 15 | 3 | 5 | 2 |
1b | UK | proc_2 | 8 | 4 | 7 | 3 |
CSV:
Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3
结果Table:
Code | Country | Procedure | Gender | Age | Frequency |
---|---|---|---|---|---|
1a | US | proc_1 | male | two-year-old | 4 |
1a | US | proc_1 | male | three-year-old | 6 |
1a | US | proc_1 | female | two-year-old | 3 |
1a | US | proc_1 | female | three-year-old | 6 |
1a | UK | proc_1 | male | two-year-old | 2 |
1a | UK | proc_1 | male | three-year-old | 3 |
1a | UK | proc_1 | female | two-year-old | 5 |
1a | UK | proc_1 | female | three-year-old | 1 |
1b | ... |
使用DataFrame.set_index
with DataFrame.stack
,如果需要删除总计添加drop
:
df = (df.drop('total', axis=1, level=1)
.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old USA female 3
3 2-year-old USA male 2
4 3-year-old UK female 9
5 3-year-old UK male 7
6 3-year-old USA female 8
7 3-year-old USA male 8
或者:
df = (df.set_index(df.columns[0])
.stack([0,1])
.rename_axis(['Age','Country','Gender'])
.reset_index(name='frequency'))
print (df)
Age Country Gender frequency
0 2-year-old UK female 1
1 2-year-old UK male 1
2 2-year-old UK total 2
3 2-year-old USA female 3
4 2-year-old USA male 2
5 2-year-old USA total 5
6 3-year-old UK female 9
7 3-year-old UK male 7
8 3-year-old UK total 16
9 3-year-old USA female 8
10 3-year-old USA male 8
11 3-year-old USA total 16