在时间序列数据上使用 pd.melt() 旋转数据帧
Pivoting dataframes with pd.melt() on time series data
我这里有一些数据:
Country/Region 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20
0 Afghanistan 0 0 0 0 0
1 Albania 0 0 0 0 0
2 Algeria 0 0 0 0 0
3 Andorra 0 0 0 0 0
4 Angola 0 0 0 0 0
5 Antigua and Barbuda 0 0 0 0 0
6 Argentina 0 0 0 0 0
7 Armenia 0 0 0 0 0
8 Australia 0 0 0 0 0
9 Australia 0 0 0 0 3
10 Australia 0 0 0 0 0
11 Australia 0 0 0 0 0
12 Australia 0 0 0 0 0
13 Australia 0 0 0 0 0
14 Australia 0 0 0 0 1
15 Australia 0 0 0 0 0
16 Austria 0 0 0 0 0
17 Azerbaijan 0 0 0 0 0
18 Bahamas 0 0 0 0 0
19 Bahrain 0 0 0 0 0
20 Bangladesh 0 0 0 0 0
我想重新排列它,使日期成为行,而国家成为列。像这样:
Country/Region Afghanistan Albania
1/22/20 0 0
1/23/20 0 0
1/24/20 0 0
等等。我试过使用 pd.melt,但不太清楚如何获得所需的输出。这是我的尝试:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
data = pd.read_csv("covid.csv", sep=",")
data = data.drop(["Province/State","Lat","Long"], axis=1)
data_melted = data.melt(value_vars=data.columns[1:], var_name="Date",value_name="Cases")
Date Cases
0 1/22/20 0
1 1/22/20 0
2 1/22/20 0
3 1/22/20 0
4 1/22/20 0
5 1/22/20 0
6 1/22/20 0
7 1/22/20 0
8 1/22/20 0
9 1/22/20 0
10 1/22/20 0
11 1/22/20 0
12 1/22/20 0
13 1/22/20 0
14 1/22/20 0
我也试过:
data_melted = data.melt(value_vars=[data.columns[1:], "Country/Region"])
但这出现了 TypeError: unhashable type: 'Index' 即使 "Country/Region" 不是索引。
在此方面如有任何帮助,我们将不胜感激。
您要转置 table:
df.set_index('Country/Region').T
我注意到Australia
重复了很多次,如果你想把它们加起来巩固:
df.set_index('Country/Region').T \
.groupby(level=0, axis=1) \
.sum()
如果您指定 pandas.melt
. Then, for country columns, run a pivot_table
聚合的 id_vars 参数,那么最初的尝试会奏效,它实际上呈现时间序列数据帧(即 date/time作为索引)用于直接绘图。
data_melted = (data.melt(id_vars = ['Country/Region'],
var_name = 'Date', value_name='Cases')
.assign(Date = lambda x: pd.to_datetime(x['Date']))
data_pivoted = data_melted.pivot_table(index='Date', columns='Country/Region',
values='Cases', aggfunc='sum')
我这里有一些数据:
Country/Region 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20
0 Afghanistan 0 0 0 0 0
1 Albania 0 0 0 0 0
2 Algeria 0 0 0 0 0
3 Andorra 0 0 0 0 0
4 Angola 0 0 0 0 0
5 Antigua and Barbuda 0 0 0 0 0
6 Argentina 0 0 0 0 0
7 Armenia 0 0 0 0 0
8 Australia 0 0 0 0 0
9 Australia 0 0 0 0 3
10 Australia 0 0 0 0 0
11 Australia 0 0 0 0 0
12 Australia 0 0 0 0 0
13 Australia 0 0 0 0 0
14 Australia 0 0 0 0 1
15 Australia 0 0 0 0 0
16 Austria 0 0 0 0 0
17 Azerbaijan 0 0 0 0 0
18 Bahamas 0 0 0 0 0
19 Bahrain 0 0 0 0 0
20 Bangladesh 0 0 0 0 0
我想重新排列它,使日期成为行,而国家成为列。像这样:
Country/Region Afghanistan Albania
1/22/20 0 0
1/23/20 0 0
1/24/20 0 0
等等。我试过使用 pd.melt,但不太清楚如何获得所需的输出。这是我的尝试:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
data = pd.read_csv("covid.csv", sep=",")
data = data.drop(["Province/State","Lat","Long"], axis=1)
data_melted = data.melt(value_vars=data.columns[1:], var_name="Date",value_name="Cases")
Date Cases
0 1/22/20 0
1 1/22/20 0
2 1/22/20 0
3 1/22/20 0
4 1/22/20 0
5 1/22/20 0
6 1/22/20 0
7 1/22/20 0
8 1/22/20 0
9 1/22/20 0
10 1/22/20 0
11 1/22/20 0
12 1/22/20 0
13 1/22/20 0
14 1/22/20 0
我也试过:
data_melted = data.melt(value_vars=[data.columns[1:], "Country/Region"])
但这出现了 TypeError: unhashable type: 'Index' 即使 "Country/Region" 不是索引。
在此方面如有任何帮助,我们将不胜感激。
您要转置 table:
df.set_index('Country/Region').T
我注意到Australia
重复了很多次,如果你想把它们加起来巩固:
df.set_index('Country/Region').T \
.groupby(level=0, axis=1) \
.sum()
如果您指定 pandas.melt
. Then, for country columns, run a pivot_table
聚合的 id_vars 参数,那么最初的尝试会奏效,它实际上呈现时间序列数据帧(即 date/time作为索引)用于直接绘图。
data_melted = (data.melt(id_vars = ['Country/Region'],
var_name = 'Date', value_name='Cases')
.assign(Date = lambda x: pd.to_datetime(x['Date']))
data_pivoted = data_melted.pivot_table(index='Date', columns='Country/Region',
values='Cases', aggfunc='sum')