Table pandas 和 altair 的气泡图
Table bubble plot with pandas and altair
我有以下 df_teams_full_stats
:
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 428 non-null int64
1 MatchID 428 non-null int64
2 For Team 428 non-null object
3 Against Team 428 non-null object
4 Date 428 non-null object
5 GameWeek 428 non-null int64
6 Home 428 non-null object
7 Possession 428 non-null float64
8 Touches 428 non-null int64
9 Passes 428 non-null int64
10 Tackles 428 non-null int64
11 Clearances 428 non-null int64
12 Corners 428 non-null int64
13 Offsides 428 non-null int64
14 Fouls Committed 428 non-null int64
15 Yellow Cards 428 non-null int64
16 Goals 428 non-null int64
17 XG 428 non-null float64
18 Shots On Target 428 non-null int64
19 Total Shots 428 non-null int64
20 Goals Conceded 428 non-null int64
21 Shots Conceded 428 non-null int64
22 XGC 428 non-null float64
23 Shots In Box 428 non-null int64
24 Close Shots 428 non-null int64
25 Headers 428 non-null int64
26 Shots Centre 428 non-null int64
27 Shots Left 428 non-null int64
28 Shots Right 428 non-null int64
29 Shots In Box Conceded 428 non-null int64
30 Close Shots Conceded 428 non-null int64
31 Headers Conceded 428 non-null int64
32 Shots Centre Conceded 428 non-null int64
33 Shots Left Conceded 428 non-null int64
34 Shots Right Conceded 428 non-null int64
dtypes: float64(3), int64(28), object(4)
memory usage: 117.2+ KB
我正在尝试对其进行分组和排序以实现此图表:
按照这个例子:table_bubble_plot_github,我们的想法是有一些目标 'x',比如说 'Goals',这将决定所有球队与其他球队比赛的气泡大小。
这是我目前拥有的:
def team_match_ups(df_teams_full_stats, x, y):
target_x = x[0]
target_y = y[0]
df_temp = df_teams_full_stats.set_index("For Team")
df_temp.fillna(0.0, inplace=True)
df_temp[target_x] = df_temp.groupby(['For Team'])[target_x].mean()
X = df_temp[target_x].to_frame()
print ("x", X)
df_temp[target_y] = df_temp.groupby(['For Team'])[target_y].mean()
Y = df_temp[target_y].to_frame()
df = df_temp.reset_index()
#sorted_teams = sorted(df_teams_full_stats['For Team'].unique())
bubble_plot = alt.Chart(df).mark_circle().encode(
alt.Y(f'{target_x}:O', bin=True),
alt.X(f'{target_y}:O', bin=True),
alt.Size(f'{target_x}:Q',
scale=alt.Scale(range=[0, 1500])),
#alt.Color('Color', legend=None, scale=None),
tooltip = [alt.Tooltip(f'{target_x}:Q'),
alt.Tooltip(f'{target_y}:Q')],
)
return bubble_plot
但这远非理想:
问题
如何让团队名称命名 x 和 y 网格并target_x 设置气泡大小?
如果您希望 x 和 y 轴是团队名称,您应该将 x 和 y 编码映射到包含团队名称的列。它可能看起来像这样(未经测试,因为您没有提供对您数据的访问权限):
alt.Chart(df_teams_full_stats).mark_circle().encode(
x='For Team:N',
y='Against Team:N',
size='Goals:Q'
)
我有以下 df_teams_full_stats
:
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 428 non-null int64
1 MatchID 428 non-null int64
2 For Team 428 non-null object
3 Against Team 428 non-null object
4 Date 428 non-null object
5 GameWeek 428 non-null int64
6 Home 428 non-null object
7 Possession 428 non-null float64
8 Touches 428 non-null int64
9 Passes 428 non-null int64
10 Tackles 428 non-null int64
11 Clearances 428 non-null int64
12 Corners 428 non-null int64
13 Offsides 428 non-null int64
14 Fouls Committed 428 non-null int64
15 Yellow Cards 428 non-null int64
16 Goals 428 non-null int64
17 XG 428 non-null float64
18 Shots On Target 428 non-null int64
19 Total Shots 428 non-null int64
20 Goals Conceded 428 non-null int64
21 Shots Conceded 428 non-null int64
22 XGC 428 non-null float64
23 Shots In Box 428 non-null int64
24 Close Shots 428 non-null int64
25 Headers 428 non-null int64
26 Shots Centre 428 non-null int64
27 Shots Left 428 non-null int64
28 Shots Right 428 non-null int64
29 Shots In Box Conceded 428 non-null int64
30 Close Shots Conceded 428 non-null int64
31 Headers Conceded 428 non-null int64
32 Shots Centre Conceded 428 non-null int64
33 Shots Left Conceded 428 non-null int64
34 Shots Right Conceded 428 non-null int64
dtypes: float64(3), int64(28), object(4)
memory usage: 117.2+ KB
我正在尝试对其进行分组和排序以实现此图表:
按照这个例子:table_bubble_plot_github,我们的想法是有一些目标 'x',比如说 'Goals',这将决定所有球队与其他球队比赛的气泡大小。
这是我目前拥有的:
def team_match_ups(df_teams_full_stats, x, y):
target_x = x[0]
target_y = y[0]
df_temp = df_teams_full_stats.set_index("For Team")
df_temp.fillna(0.0, inplace=True)
df_temp[target_x] = df_temp.groupby(['For Team'])[target_x].mean()
X = df_temp[target_x].to_frame()
print ("x", X)
df_temp[target_y] = df_temp.groupby(['For Team'])[target_y].mean()
Y = df_temp[target_y].to_frame()
df = df_temp.reset_index()
#sorted_teams = sorted(df_teams_full_stats['For Team'].unique())
bubble_plot = alt.Chart(df).mark_circle().encode(
alt.Y(f'{target_x}:O', bin=True),
alt.X(f'{target_y}:O', bin=True),
alt.Size(f'{target_x}:Q',
scale=alt.Scale(range=[0, 1500])),
#alt.Color('Color', legend=None, scale=None),
tooltip = [alt.Tooltip(f'{target_x}:Q'),
alt.Tooltip(f'{target_y}:Q')],
)
return bubble_plot
但这远非理想:
问题
如何让团队名称命名 x 和 y 网格并target_x 设置气泡大小?
如果您希望 x 和 y 轴是团队名称,您应该将 x 和 y 编码映射到包含团队名称的列。它可能看起来像这样(未经测试,因为您没有提供对您数据的访问权限):
alt.Chart(df_teams_full_stats).mark_circle().encode(
x='For Team:N',
y='Against Team:N',
size='Goals:Q'
)