非数字 x 轴散点图中的抖动
Jitter in scatterplot for non-numeric x-axis
我正在寻找一种描述性地分散 pandas.DataFrame
类似于此的方法:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 type 1000 non-null object
1 value 1000 non-null int64
2 count 1000 non-null int64
dtypes: int64(2), object(1)
memory usage: 23.6+ KB
使用pandas.DataFrame.plot
或seaborn.scatterplot
,每个type
的点都放在一条相互重叠的垂直线上。为了缓解这个问题,我想在 x 方向至少引入一些抖动,但我不知道如何。
到目前为止我的情节:
import pandas as pd
import matplotlib.pyplot as plt
import random
df = pd.DataFrame({
'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
'value': [random.randint(0, 500) for _ in range(1000)],
'count': [random.randint(0,250) for _ in range(1000)],
})
df.plot(kind='scatter', x='type', y='value', c='count', cmap='Blues')
plt.show()
import seaborn as sns
sns.scatterplot(x='type', y='value', data=df, hue='count')
plt.show()
我设法通过用数值编码类型然后抖动它们来抖动类型。但是,这需要在 DataFrame
.
中至少再增加 1 列
import pandas as pd
import matplotlib.pyplot as plt
import random
df = pd.DataFrame({
'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
'value': [random.randint(0, 500) for _ in range(1000)],
'count': [random.randint(0,250) for _ in range(1000)],
})
def jitter(x):
return x + random.uniform(0, .5) -.25
type_ids = {'t1': 1, 't2': 2, 't3': 3}
df['type_id'] = df['type'].apply(lambda x: type_ids[x])
df['jitter_type'] = df['type_id'].apply(lambda x: jitter(x))
df.plot(kind='scatter', x='jitter_type', y='value', c='count', cmap='Blues')
plt.xticks([1,2,3])
plt.gca().set_xticklabels(['t1', 't2', 't3'])
plt.show()
您的方法的问题是 seaborn 的 scatterplot
缺乏在分类数据的上下文中有意义的特定功能,例如 jitter
。因此,seaborn 提供了“分类数据的散点图”:stripplot
or swarmplot
。但是 seaborn 创造了一个……有趣的人物传奇。我们必须摆脱它并用颜色条替换它:
#fake data generation
import pandas as pd
import numpy as np
np.random.seed(123)
ndf = 1000
df = pd.DataFrame({
'Type': [np.random.choice(['t1', 't2', 't3']) for _ in range(ndf)],
'Val': [np.random.randint(0, 700) for _ in range(ndf)],
'Cou': [np.random.randint(0, 500) for _ in range(ndf)],
})
#now the actual plotting
import seaborn as sns
from matplotlib import colors, cm
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
#preparation for the colorbars
pal = "coolwarm"
normpal = colors.Normalize(df.Cou.min(), df.Cou.max())
#stripplot display
sns.stripplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax1, jitter=0.2)
ax1.get_legend().remove()
ax1.set_title("stripplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax1)
#swarmplot display
sns.swarmplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax2)
ax2.get_legend().remove()
ax2.set_title("swarmplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax2)
plt.tight_layout()
plt.show()
示例输出:
我正在寻找一种描述性地分散 pandas.DataFrame
类似于此的方法:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 type 1000 non-null object
1 value 1000 non-null int64
2 count 1000 non-null int64
dtypes: int64(2), object(1)
memory usage: 23.6+ KB
使用pandas.DataFrame.plot
或seaborn.scatterplot
,每个type
的点都放在一条相互重叠的垂直线上。为了缓解这个问题,我想在 x 方向至少引入一些抖动,但我不知道如何。
到目前为止我的情节:
import pandas as pd
import matplotlib.pyplot as plt
import random
df = pd.DataFrame({
'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
'value': [random.randint(0, 500) for _ in range(1000)],
'count': [random.randint(0,250) for _ in range(1000)],
})
df.plot(kind='scatter', x='type', y='value', c='count', cmap='Blues')
plt.show()
import seaborn as sns
sns.scatterplot(x='type', y='value', data=df, hue='count')
plt.show()
我设法通过用数值编码类型然后抖动它们来抖动类型。但是,这需要在 DataFrame
.
import pandas as pd
import matplotlib.pyplot as plt
import random
df = pd.DataFrame({
'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
'value': [random.randint(0, 500) for _ in range(1000)],
'count': [random.randint(0,250) for _ in range(1000)],
})
def jitter(x):
return x + random.uniform(0, .5) -.25
type_ids = {'t1': 1, 't2': 2, 't3': 3}
df['type_id'] = df['type'].apply(lambda x: type_ids[x])
df['jitter_type'] = df['type_id'].apply(lambda x: jitter(x))
df.plot(kind='scatter', x='jitter_type', y='value', c='count', cmap='Blues')
plt.xticks([1,2,3])
plt.gca().set_xticklabels(['t1', 't2', 't3'])
plt.show()
您的方法的问题是 seaborn 的 scatterplot
缺乏在分类数据的上下文中有意义的特定功能,例如 jitter
。因此,seaborn 提供了“分类数据的散点图”:stripplot
or swarmplot
。但是 seaborn 创造了一个……有趣的人物传奇。我们必须摆脱它并用颜色条替换它:
#fake data generation
import pandas as pd
import numpy as np
np.random.seed(123)
ndf = 1000
df = pd.DataFrame({
'Type': [np.random.choice(['t1', 't2', 't3']) for _ in range(ndf)],
'Val': [np.random.randint(0, 700) for _ in range(ndf)],
'Cou': [np.random.randint(0, 500) for _ in range(ndf)],
})
#now the actual plotting
import seaborn as sns
from matplotlib import colors, cm
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
#preparation for the colorbars
pal = "coolwarm"
normpal = colors.Normalize(df.Cou.min(), df.Cou.max())
#stripplot display
sns.stripplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax1, jitter=0.2)
ax1.get_legend().remove()
ax1.set_title("stripplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax1)
#swarmplot display
sns.swarmplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax2)
ax2.get_legend().remove()
ax2.set_title("swarmplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax2)
plt.tight_layout()
plt.show()
示例输出: