Python:如何跨多个区间创建固定范围的散点图?
Python: how to create scatterplots of fixed ranges, across several intervals?
我有以下 pandas DataFrame:
import pandas as pd
df = pd.read_table(...)
df
>>> df
>>> interval location type y_axis
0 01 1230 X 50
1 01 1609 X 55
2 01 1903 Y 54
3 01 2574 A 58
4 01 3151 A 57
5 01 3198 B 46
6 01 3312 X 50
... .....
02 42 X 31
02 214 A 23
02 598 X 28
....
有几个间隔,例如01
、02
等。在每个区间内,数据点在1到10,000的范围内。在 df
中,第一个数据点位于 40,下一个数据点位于 136,依此类推。
间隔 02
的范围也是从 1 到 15,000。
我想创建一个散点图,以便为每个间隔按比例绘制 1 到 15000 的范围。然后第一个点将绘制在 1230,下一个绘制在 1609,等等。我还想要一条垂直线来显示间隔的位置。散点图的 x 轴的间距应为 1 到 10,000。每个区间是一个 "region",包含从 1 到 10,000 的 x 轴。所以x轴上的坐标是interval1: 1 to 15000, interval2: 1 to 15000, interval 3: 1 to 15000, 等等(这几乎就像是几个单独的散点图串联在一起。)
如何做到这一点?没有这种复杂的间隔,如果想从这个 DataFrame 创建一个散点图,可以使用:
df.plot(kind='scatter', x = "location", y = "y_axis")
这是前 50 行:
d = {"interval" : ["01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01"], "location" : [1230, 1609,
1903, 2574, 3151, 3198, 3312, 3659, 3709,
3725, 4172, 4542, 4860, 4900, 5068, 5220,
5260, 5339, 5442, 5529, 5773, 6128, 6165,
6177, 6269, 6275, 6460, 7167, 7361, 7361,
8051, 8222, 8305, 8992, 9104, 9439, 9844,
10045, 10764, 10787, 11104, 11478, 11508,
11684, 12490, 12590, 12794, 12803, 13823,
13982], "type" : ["X", "X", "Y", "A", "A",
"B", "X", "X", "X", "B", "B", "A", "A", "A", "B", "B", "X",
"B", "Y", "X", "X", "Y", "Y", "C", "A", "X", "X", "Z", "Z",
"B", "X", "X", "A", "A", "Y", "X", "A", "X", "X", "Z", "Z",
"C", "X", "Y", "Y", "Z", "Z", "Z", "Z", "Z"], "y_axis" : [50, 55,
54, 58, 57, 46, 50, 55, 46, 42, 56, 55, 55, 45, 52, 51, 45, 48, 50,
49, 53, 55, 45, 40, 49, 37, 52, 58, 52, 4, 58, 52, 49, 58, 50, 55,
56, 53, 58, 43, 55, 55, 44, 52, 59, 49, 53, 39, 60, 52]}
您似乎想为每个类别绘制不同的散点图 "interval"。
这可以通过按相应列对数据框进行分组来完成。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cat = ["01"] *5 + ["02"]*4
x = np.append(np.arange(1,6), np.arange(2.5,4.1,0.5))
y = np.random.randint(12,24, size=len(cat))
df = pd.DataFrame({"cat":cat, "x":x, "y":y})
fig, ax = plt.subplots()
colors={"01":"crimson", "02":"darkblue"}
for cat, grouped in df.groupby("cat"):
grouped.plot(kind="scatter", x="x", y="y", ax=ax, label=cat, color=colors[cat])
plt.show()
这里的主要挑战似乎是您希望 x 轴既是分类轴(区间 01
、02
等)又是度量轴(值 1
-15000
).正如您甚至在 post 中指出的那样,您实际上是在谈论使用共享 y 轴绘制多个散点图。我建议您使用 subplots
和 groupby
来做到这一点。您可以使用 subplots_adjust()
调整图之间的 space,正如我在这个答案中所做的那样。
首先,使用来自 OP 的 d
生成一些示例数据。我们还将随机 select 一半的观察值并将它们更改为 interval=02
,以演示所需的镶板:
import pandas as pd
import numpy as np
df = pd.DataFrame(d)
# shuffle rows
# (taken from this answer:
df = df.reindex(np.random.permutation(df.index))
# randomly select half of the rows for changing to interval 02
interval02 = df.sample(int(df.shape[0]/2.)).index
df.loc[interval02, 'interval'] = "02"
现在使用 pyplot
指定并排子图,并删除图之间的任何填充。
from matplotlib import pyplot as plt
# n_plots = number of different interval values
n_plots = len(df.interval.unique())
fig, axes = plt.subplots(1, n_plots, figsize=(10,5), sharey=True)
# remove space between plots
fig.subplots_adjust(hspace=0, wspace=0)
最后,groupby
interval
和情节:
for i, (name, group) in enumerate(df.groupby('interval')):
group.plot(kind="scatter", x='location', y='y_axis',
ax=axes[i], title="Interval {}".format(name))
使用 Altair,您可以轻松地将两个间隔分开 columns/colors。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cat = ["01"] *5 + ["02"]*4
x = np.append(np.arange(1,6), np.arange(2.5,4.1,0.5))
y = np.random.randint(12,24, size=len(cat))
df = pd.DataFrame({"cat":cat, "x":x, "y":y})
按列
from altair import *
Chart(df).mark_point().encode(x='x', y='y', column='cat').configure_cell(width=200, height=150)
按颜色
from altair import *
Chart(df).mark_point().encode(x='x', y='y', color='cat').configure_cell(width=200, height=150)
我有以下 pandas DataFrame:
import pandas as pd
df = pd.read_table(...)
df
>>> df
>>> interval location type y_axis
0 01 1230 X 50
1 01 1609 X 55
2 01 1903 Y 54
3 01 2574 A 58
4 01 3151 A 57
5 01 3198 B 46
6 01 3312 X 50
... .....
02 42 X 31
02 214 A 23
02 598 X 28
....
有几个间隔,例如01
、02
等。在每个区间内,数据点在1到10,000的范围内。在 df
中,第一个数据点位于 40,下一个数据点位于 136,依此类推。
间隔 02
的范围也是从 1 到 15,000。
我想创建一个散点图,以便为每个间隔按比例绘制 1 到 15000 的范围。然后第一个点将绘制在 1230,下一个绘制在 1609,等等。我还想要一条垂直线来显示间隔的位置。散点图的 x 轴的间距应为 1 到 10,000。每个区间是一个 "region",包含从 1 到 10,000 的 x 轴。所以x轴上的坐标是interval1: 1 to 15000, interval2: 1 to 15000, interval 3: 1 to 15000, 等等(这几乎就像是几个单独的散点图串联在一起。)
如何做到这一点?没有这种复杂的间隔,如果想从这个 DataFrame 创建一个散点图,可以使用:
df.plot(kind='scatter', x = "location", y = "y_axis")
这是前 50 行:
d = {"interval" : ["01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",
"01", "01", "01", "01", "01"], "location" : [1230, 1609,
1903, 2574, 3151, 3198, 3312, 3659, 3709,
3725, 4172, 4542, 4860, 4900, 5068, 5220,
5260, 5339, 5442, 5529, 5773, 6128, 6165,
6177, 6269, 6275, 6460, 7167, 7361, 7361,
8051, 8222, 8305, 8992, 9104, 9439, 9844,
10045, 10764, 10787, 11104, 11478, 11508,
11684, 12490, 12590, 12794, 12803, 13823,
13982], "type" : ["X", "X", "Y", "A", "A",
"B", "X", "X", "X", "B", "B", "A", "A", "A", "B", "B", "X",
"B", "Y", "X", "X", "Y", "Y", "C", "A", "X", "X", "Z", "Z",
"B", "X", "X", "A", "A", "Y", "X", "A", "X", "X", "Z", "Z",
"C", "X", "Y", "Y", "Z", "Z", "Z", "Z", "Z"], "y_axis" : [50, 55,
54, 58, 57, 46, 50, 55, 46, 42, 56, 55, 55, 45, 52, 51, 45, 48, 50,
49, 53, 55, 45, 40, 49, 37, 52, 58, 52, 4, 58, 52, 49, 58, 50, 55,
56, 53, 58, 43, 55, 55, 44, 52, 59, 49, 53, 39, 60, 52]}
您似乎想为每个类别绘制不同的散点图 "interval"。
这可以通过按相应列对数据框进行分组来完成。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cat = ["01"] *5 + ["02"]*4
x = np.append(np.arange(1,6), np.arange(2.5,4.1,0.5))
y = np.random.randint(12,24, size=len(cat))
df = pd.DataFrame({"cat":cat, "x":x, "y":y})
fig, ax = plt.subplots()
colors={"01":"crimson", "02":"darkblue"}
for cat, grouped in df.groupby("cat"):
grouped.plot(kind="scatter", x="x", y="y", ax=ax, label=cat, color=colors[cat])
plt.show()
这里的主要挑战似乎是您希望 x 轴既是分类轴(区间 01
、02
等)又是度量轴(值 1
-15000
).正如您甚至在 post 中指出的那样,您实际上是在谈论使用共享 y 轴绘制多个散点图。我建议您使用 subplots
和 groupby
来做到这一点。您可以使用 subplots_adjust()
调整图之间的 space,正如我在这个答案中所做的那样。
首先,使用来自 OP 的 d
生成一些示例数据。我们还将随机 select 一半的观察值并将它们更改为 interval=02
,以演示所需的镶板:
import pandas as pd
import numpy as np
df = pd.DataFrame(d)
# shuffle rows
# (taken from this answer:
df = df.reindex(np.random.permutation(df.index))
# randomly select half of the rows for changing to interval 02
interval02 = df.sample(int(df.shape[0]/2.)).index
df.loc[interval02, 'interval'] = "02"
现在使用 pyplot
指定并排子图,并删除图之间的任何填充。
from matplotlib import pyplot as plt
# n_plots = number of different interval values
n_plots = len(df.interval.unique())
fig, axes = plt.subplots(1, n_plots, figsize=(10,5), sharey=True)
# remove space between plots
fig.subplots_adjust(hspace=0, wspace=0)
最后,groupby
interval
和情节:
for i, (name, group) in enumerate(df.groupby('interval')):
group.plot(kind="scatter", x='location', y='y_axis',
ax=axes[i], title="Interval {}".format(name))
使用 Altair,您可以轻松地将两个间隔分开 columns/colors。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cat = ["01"] *5 + ["02"]*4
x = np.append(np.arange(1,6), np.arange(2.5,4.1,0.5))
y = np.random.randint(12,24, size=len(cat))
df = pd.DataFrame({"cat":cat, "x":x, "y":y})
按列
from altair import *
Chart(df).mark_point().encode(x='x', y='y', column='cat').configure_cell(width=200, height=150)
按颜色
from altair import *
Chart(df).mark_point().encode(x='x', y='y', color='cat').configure_cell(width=200, height=150)