python 堆积面积图
python Stacked area chart
我正在尝试创建堆积面积图,显示课程及其数量随时间的演变。所以我的数据框是 (index=Year):
Area Courses
Year
1900 Agriculture 0.0
1900 Architecture 32.0
1900 Astronomy 10.0
1900 Biology 20.0
1900 Chemistry 25.0
1900 Civil Engineering 21.0
1900 Education 14.0
1900 Engineering Design 10.0
1900 English 30.0
1900 Geography 1.0
去年:2011.
我尝试了几种解决方案,例如df.plot.area()、df.plot.area(x='Years')。
然后我认为将区域作为列会有所帮助,所以我尝试了
df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')
但我没有得到每年的课程总和,而是得到了:
Area Aeronautical Engineering ... Visual Design
Year ...
1900 NaN ... NaN
1901 NaN ... NaN
感谢您的帮助。
这是我的第一个 post。对不起,如果我错过了什么。
更新。这是我的代码:
df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()
这里是数据集的 link:https://data.world/makeovermonday/2020w12
根据额外给定的信息,我做了这个。希望你喜欢!
import pandas as pd
import matplotlib.pyplot as plt
plt.close('all')
df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
encoding='unicode_escape')
df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
'Courses').reset_index()
aux1=df.duplicated(subset='GenArea', keep='first').values
aux2=df.duplicated(subset='Year', keep='first').values
n=len(aux1);year=[];courses=[]
for i in range(n):
if not aux1[i]:
courses.append(df.iloc[i]['GenArea'])
if not aux2[i]:
year.append(df.iloc[i]['Year'])
else:
continue
del aux1,aux2
df1=pd.DataFrame(index=year)
s=0
for i in range(len(courses)):
df1[courses[i]]=0
for i in range(n):
string=df.iloc[i]['GenArea']
if any(df1.iloc[s].values==0):
df1.at[year[s],string]=df.iloc[i]['Courses']
else:
s+=1
df1.at[year[s],string]=df.iloc[i]['Courses']
del year,courses,df
df1=df1[df1.columns[::-1]]
df1.plot.area(legend='reverse')
我正在尝试创建堆积面积图,显示课程及其数量随时间的演变。所以我的数据框是 (index=Year):
Area Courses
Year
1900 Agriculture 0.0
1900 Architecture 32.0
1900 Astronomy 10.0
1900 Biology 20.0
1900 Chemistry 25.0
1900 Civil Engineering 21.0
1900 Education 14.0
1900 Engineering Design 10.0
1900 English 30.0
1900 Geography 1.0
去年:2011.
我尝试了几种解决方案,例如df.plot.area()、df.plot.area(x='Years')。 然后我认为将区域作为列会有所帮助,所以我尝试了
df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')
但我没有得到每年的课程总和,而是得到了:
Area Aeronautical Engineering ... Visual Design
Year ...
1900 NaN ... NaN
1901 NaN ... NaN
感谢您的帮助。 这是我的第一个 post。对不起,如果我错过了什么。
更新。这是我的代码:
df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()
这里是数据集的 link:https://data.world/makeovermonday/2020w12
根据额外给定的信息,我做了这个。希望你喜欢!
import pandas as pd
import matplotlib.pyplot as plt
plt.close('all')
df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
encoding='unicode_escape')
df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
'Courses').reset_index()
aux1=df.duplicated(subset='GenArea', keep='first').values
aux2=df.duplicated(subset='Year', keep='first').values
n=len(aux1);year=[];courses=[]
for i in range(n):
if not aux1[i]:
courses.append(df.iloc[i]['GenArea'])
if not aux2[i]:
year.append(df.iloc[i]['Year'])
else:
continue
del aux1,aux2
df1=pd.DataFrame(index=year)
s=0
for i in range(len(courses)):
df1[courses[i]]=0
for i in range(n):
string=df.iloc[i]['GenArea']
if any(df1.iloc[s].values==0):
df1.at[year[s],string]=df.iloc[i]['Courses']
else:
s+=1
df1.at[year[s],string]=df.iloc[i]['Courses']
del year,courses,df
df1=df1[df1.columns[::-1]]
df1.plot.area(legend='reverse')