如何为我的 seaborn 和 matplot 图添加抖动?

How can I add jitter to my seaborn and matplot plots?

我正在尝试使用 seaborn 和 matplot 图将 Jitter 添加到我的图中。我从在线阅读的内容中获得了混合信息。一些信息说需要进行编码,而另一些信息显示它就像 jitter = True 一样简单。我有另一个图书馆或我不知道应该导入的东西吗?下面是我 运行 并试图将抖动添加到的代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()

%matplotlib inline
ax = plt.figure(figsize=(12, 6)).gca() # define axis
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, alpha = 0.2)
# auto_price.plot(kind = 'scatter', x = 'city-mpg', y = 'price', ax = ax)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')

ax = sns.kdeplot(headcount_df.loc[:, ['TablesOpen', 'Hour']], shade = True, cmap = 'PuBu')
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, jitter = True)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')

我在尝试添加抖动时收到错误消息:AttributeError: 'PathCollection' object has no property 'jitter'。任何帮助或更多信息将不胜感激

要向散点图添加抖动,首先获取包含散点的集合的句柄。当刚刚在 ax 上创建散点图时,ax.collections[-1] 将是所需的集合。

对集合调用 get_offsets() 获取点的所有 xy 坐标。为它们中的每一个添加一些小的随机数。由于在这种情况下所有坐标都是整数,因此添加一个介于 0 和 1 之间的随机数会使点均匀分布。

在这种情况下,点数非常大。为了更好地看到点集中在哪里,可以将它们做得很小(marker=',', linewidth=0, s=1,)并且非常透明(例如alpha=0.1)。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)

fig, ax = plt.subplots(figsize=(12, 6))

headcount_df.plot.scatter(x='Hour', y='TablesOpen', marker=',', linewidth=0, s=1, alpha=.1, color='crimson', ax=ax)
dots = ax.collections[-1]
offsets = dots.get_offsets()
jittered_offsets = offsets + np.random.uniform(0, 1, offsets.shape)
dots.set_offsets(jittered_offsets)

ax.set_title('Hour vs TablesOpen')  # Give the plot a main title
ax.set_ylabel('TablesOpen')  # Set text for y axis
ax.set_xlabel('Hour')
ax.set_xticks(range(25))
ax.autoscale(enable=True, tight=True)

plt.tight_layout()
plt.show()

由于点数较多,绘制二维kde耗时较长。通过从行中随机抽样可以减少时间。请注意,要绘制 2D kde,最新版本的 Seaborn 需要将每一列作为单独的参数。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)

fig, ax = plt.subplots(figsize=(12, 6))

N = 5000
rand_sel_df = headcount_df.iloc[np.random.choice(range(len(headcount_df)), N)]
ax = sns.kdeplot(rand_sel_df['Hour'], rand_sel_df['TablesOpen'], shade=True, cmap='PuBu', ax=ax)

ax.set_title('Hour vs TablesOpen')
ax.set_xticks(range(25))

plt.tight_layout()
plt.show()