重塑 pandas 数据框(按列转置行)
Reshaping pandas dataframe (transpose rows by columns)
我刚开始学习 Python 并希望实现重塑它在 R 上的完成方式。
我目前有 d1 形式的数据框,想将其转换为 d2:
d1:列为:国家/地区、Movie_type(值为动作、浪漫或戏剧)、1 月 20 日、2 月 20 日和 3 月 20 日。
d2:列为:国家/地区、月份-年份(值为 Jan-20、Feb-20 或 Mar-20)、动作片、戏剧、浪漫片..
(请注意,戏剧和浪漫栏不显示 d1 的实际计数,只有动作显示)。
d1 = pd.DataFrame({
"Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Israel", 6:"India",7: "India", 8:"Poland", 9: "Zambia"},
"Movie_type" : {0: "Action",1:"Drama",2: "Romance",3: "Romance",4: "Action",5: "Action",6: "Action",7: "Drama",8: "Drama",9:"Action"},
"Jan-20" :{0: 0,1: 1,2: 2,3: 3,4: 0,5: 3,6: 2,7: 0,8: 2,9: 5},
"Feb-20" :{0: 2,1: 3,2: 2,3: 1,4: 0,5: 0,6: 2,7: 4,8: 2,9: 7},
"Mar-20" :{0: 1,1: 5,2: 2,3: 7,4: 4,5: 8,6: 2,7: 4,8: 5,9: 9}
}
)
d2 = pd.DataFrame({
"Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Canada", 6:"Israel",7: "Israel", 8:"Israel", 9: "India", 10: "India", 11: "India", 12: "Poland", 13: "Poland", 14: "Poland", 15: "Zambia", 16: "Zambia", 17: "Zambia"},
"Month_year" : {0: "Jan-20",1: "Feb-20", 2:"Mar-20", 3: "Jan-20",4: "Feb-20", 5:"Mar-20",6: "Jan-20",7: "Feb-20", 8:"Mar-20",9: "Jan-20",10: "Feb-20", 11:"Mar-20",12: "Jan-20",13: "Feb-20", 14:"Mar-20",15: "Jan-20",16: "Feb-20", 17:"Mar-20"},
"Action" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 3,7: 0,8: 0,9: 2,10:0,11:0,12:0,13:0,14:0,15:5,16:0,17:0},
"Drama" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0},
"Romance" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0}
}
)
如何将 Movie_types 转换为列名并将 MMM-YY(目前在 d1 中作为列名)转换为行的差异值?如果这是一个 stack/unstack/pivot 或重塑问题,我很乐意接受教育。
使用 DataFrame.melt
with DataFrame.pivot
,正确的顺序是将值转换为日期时间:
df = (d1.melt(['Country','Movie_type'],
var_name='Month_year')
.assign(Month_year = lambda x: pd.to_datetime(x['Month_year'], format='%b-%y'))
.pivot(['Country','Month_year'], 'Movie_type','value')
.fillna(0)
.astype(int)
.rename(index = lambda x: x.strftime('%b-%y'), level=1)
.reset_index()
)
print (df)
Movie_type Country Month_year Action Drama Romance
0 Australia Jan-20 0 1 2
1 Australia Feb-20 2 3 2
2 Australia Mar-20 1 5 2
3 Canada Jan-20 0 0 3
4 Canada Feb-20 0 0 1
5 Canada Mar-20 4 0 7
6 India Jan-20 2 0 0
7 India Feb-20 2 4 0
8 India Mar-20 2 4 0
9 Israel Jan-20 3 0 0
10 Israel Feb-20 0 0 0
11 Israel Mar-20 8 0 0
12 Poland Jan-20 0 2 0
13 Poland Feb-20 0 2 0
14 Poland Mar-20 0 5 0
15 Zambia Jan-20 5 0 0
16 Zambia Feb-20 7 0 0
17 Zambia Mar-20 9 0 0
我刚开始学习 Python 并希望实现重塑它在 R 上的完成方式。
我目前有 d1 形式的数据框,想将其转换为 d2:
d1:列为:国家/地区、Movie_type(值为动作、浪漫或戏剧)、1 月 20 日、2 月 20 日和 3 月 20 日。
d2:列为:国家/地区、月份-年份(值为 Jan-20、Feb-20 或 Mar-20)、动作片、戏剧、浪漫片..
(请注意,戏剧和浪漫栏不显示 d1 的实际计数,只有动作显示)。
d1 = pd.DataFrame({
"Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Israel", 6:"India",7: "India", 8:"Poland", 9: "Zambia"},
"Movie_type" : {0: "Action",1:"Drama",2: "Romance",3: "Romance",4: "Action",5: "Action",6: "Action",7: "Drama",8: "Drama",9:"Action"},
"Jan-20" :{0: 0,1: 1,2: 2,3: 3,4: 0,5: 3,6: 2,7: 0,8: 2,9: 5},
"Feb-20" :{0: 2,1: 3,2: 2,3: 1,4: 0,5: 0,6: 2,7: 4,8: 2,9: 7},
"Mar-20" :{0: 1,1: 5,2: 2,3: 7,4: 4,5: 8,6: 2,7: 4,8: 5,9: 9}
}
)
d2 = pd.DataFrame({
"Country" : {0: "Australia",1: "Australia", 2:"Australia", 3:"Canada", 4:"Canada", 5:"Canada", 6:"Israel",7: "Israel", 8:"Israel", 9: "India", 10: "India", 11: "India", 12: "Poland", 13: "Poland", 14: "Poland", 15: "Zambia", 16: "Zambia", 17: "Zambia"},
"Month_year" : {0: "Jan-20",1: "Feb-20", 2:"Mar-20", 3: "Jan-20",4: "Feb-20", 5:"Mar-20",6: "Jan-20",7: "Feb-20", 8:"Mar-20",9: "Jan-20",10: "Feb-20", 11:"Mar-20",12: "Jan-20",13: "Feb-20", 14:"Mar-20",15: "Jan-20",16: "Feb-20", 17:"Mar-20"},
"Action" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 3,7: 0,8: 0,9: 2,10:0,11:0,12:0,13:0,14:0,15:5,16:0,17:0},
"Drama" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0},
"Romance" :{0: 0,1: 0,2: 0,3: 0,4: 0,5: 0,6: 0,7: 0,8: 0,9: 0,10:0,11:0,12:0,13:0,14:0,15:0,16:0,17:0}
}
)
如何将 Movie_types 转换为列名并将 MMM-YY(目前在 d1 中作为列名)转换为行的差异值?如果这是一个 stack/unstack/pivot 或重塑问题,我很乐意接受教育。
使用 DataFrame.melt
with DataFrame.pivot
,正确的顺序是将值转换为日期时间:
df = (d1.melt(['Country','Movie_type'],
var_name='Month_year')
.assign(Month_year = lambda x: pd.to_datetime(x['Month_year'], format='%b-%y'))
.pivot(['Country','Month_year'], 'Movie_type','value')
.fillna(0)
.astype(int)
.rename(index = lambda x: x.strftime('%b-%y'), level=1)
.reset_index()
)
print (df)
Movie_type Country Month_year Action Drama Romance
0 Australia Jan-20 0 1 2
1 Australia Feb-20 2 3 2
2 Australia Mar-20 1 5 2
3 Canada Jan-20 0 0 3
4 Canada Feb-20 0 0 1
5 Canada Mar-20 4 0 7
6 India Jan-20 2 0 0
7 India Feb-20 2 4 0
8 India Mar-20 2 4 0
9 Israel Jan-20 3 0 0
10 Israel Feb-20 0 0 0
11 Israel Mar-20 8 0 0
12 Poland Jan-20 0 2 0
13 Poland Feb-20 0 2 0
14 Poland Mar-20 0 5 0
15 Zambia Jan-20 5 0 0
16 Zambia Feb-20 7 0 0
17 Zambia Mar-20 9 0 0