Matplotlib:如何从 pandas 数据框创建堆积条形图?
Matplotlib: how to create stacked bar plot from pandas data frame?
从以下开始
df = pd.DataFrame( {'Item':['A','A','A','B','B','C','C','C','C'],
'Name': ['Tom','John','Paul','Tom','Frank','Tom', 'John', 'Richard', 'James'],
'Total':[3,3,3,2,2,4,4,4,4]})
print df
Item Name
0 A Tom
1 A John
2 A Paul
3 B Tom
4 B Frank
5 C Tom
6 C John
7 C Richard
8 C James
#merge M:N by column Item
df1 = pd.merge(df, df, on=['Item'])
#remove duplicity - column Name_x == Name_y
df1 = df1[~(df1['Name_x'] == df1['Name_y'])]
#print df1
#create lists
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index()
print df1
Name_x Name_y
0 Frank [Tom]
1 James [Tom, John, Richard]
2 John [Tom, Paul, Tom, Richard, James]
3 Paul [Tom, John]
4 Richard [Tom, John, James]
5 Tom [John, Paul, Frank, John, Richard, James]
我有一个数据框如下:
print df
Name People times
0 Frank [Tom] [1]
1 James [John, Richard, Tom] [1, 1, 1]
2 John [James, Paul, Richard, Tom] [1, 1, 1, 2]
3 Paul [John, Tom] [1, 1]
4 Richard [James, John, Tom] [1, 1, 1]
5 Tom [Frank, James, John, Paul, Richard] [1, 1, 2, 1, 1]
我想为每个 Name
创建一个堆叠条形图,将 People
视为条形,将 times
视为值。
我想做这样的事情
sub_df = df.groupby(['Name','People'])['Times'].sum().unstack()
sub_df.plot(kind='bar',stacked=True)
但是 returns
TypeError: unhashable type: 'numpy.ndarray'
groupby
:
后必须使用'agg'灵活类型申请
df1['People'] = df1['Name_y'].apply(lambda x: tuple(x))
df1['Times'] = df1['Name_y'].apply(lambda x: [x.count(name) for name in list(set(x))])
s = df1.groupby(['Name_x','People']).apply(lambda x: sum(x.iloc[0]['Times']))
然后你得到以下内容
Name_x People
Frank (Tom,) 1
James (Tom, John, Richard) 3
John (Tom, Paul, Tom, Richard, James) 5
Paul (Tom, John) 2
Richard (Tom, John, James) 3
Tom (John, Paul, Frank, John, Richard, James) 6
dtype: int64
你可以随心所欲地绘制
s.plot(kind='bar', stacked=True)
从以下开始
df = pd.DataFrame( {'Item':['A','A','A','B','B','C','C','C','C'],
'Name': ['Tom','John','Paul','Tom','Frank','Tom', 'John', 'Richard', 'James'],
'Total':[3,3,3,2,2,4,4,4,4]})
print df
Item Name
0 A Tom
1 A John
2 A Paul
3 B Tom
4 B Frank
5 C Tom
6 C John
7 C Richard
8 C James
#merge M:N by column Item
df1 = pd.merge(df, df, on=['Item'])
#remove duplicity - column Name_x == Name_y
df1 = df1[~(df1['Name_x'] == df1['Name_y'])]
#print df1
#create lists
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index()
print df1
Name_x Name_y
0 Frank [Tom]
1 James [Tom, John, Richard]
2 John [Tom, Paul, Tom, Richard, James]
3 Paul [Tom, John]
4 Richard [Tom, John, James]
5 Tom [John, Paul, Frank, John, Richard, James]
我有一个数据框如下:
print df
Name People times
0 Frank [Tom] [1]
1 James [John, Richard, Tom] [1, 1, 1]
2 John [James, Paul, Richard, Tom] [1, 1, 1, 2]
3 Paul [John, Tom] [1, 1]
4 Richard [James, John, Tom] [1, 1, 1]
5 Tom [Frank, James, John, Paul, Richard] [1, 1, 2, 1, 1]
我想为每个 Name
创建一个堆叠条形图,将 People
视为条形,将 times
视为值。
我想做这样的事情
sub_df = df.groupby(['Name','People'])['Times'].sum().unstack()
sub_df.plot(kind='bar',stacked=True)
但是 returns
TypeError: unhashable type: 'numpy.ndarray'
groupby
:
df1['People'] = df1['Name_y'].apply(lambda x: tuple(x))
df1['Times'] = df1['Name_y'].apply(lambda x: [x.count(name) for name in list(set(x))])
s = df1.groupby(['Name_x','People']).apply(lambda x: sum(x.iloc[0]['Times']))
然后你得到以下内容
Name_x People
Frank (Tom,) 1
James (Tom, John, Richard) 3
John (Tom, Paul, Tom, Richard, James) 5
Paul (Tom, John) 2
Richard (Tom, John, James) 3
Tom (John, Paul, Frank, John, Richard, James) 6
dtype: int64
你可以随心所欲地绘制
s.plot(kind='bar', stacked=True)