Altair:创建分层小提琴 + stripplot
Altair: Creating a layered violin + stripplot
我正在尝试创建一个包含小提琴图和带有抖动的条带图的图。我该怎么做呢?我在下面提供了我的尝试。我一直遇到的问题是小提琴情节在情节中似乎是看不见的。
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"n_genes_by_counts",
as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
y="n_genes_by_counts:Q",
x=alt.X("Density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
y="n_gene_by_counts",
x=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)
# 3. Combine both
combined = stripplot + violin
我感觉这可能是 X 轴缩放的问题。也就是说,density
比 jitter
小得多。如果是这样,我如何使 jitter
与 density
处于同一数量级?有人可以告诉我如何在给定属于某个 pandas 数据框 df
的列名 n_gene_by_counts
的情况下创建小提琴+条带图吗?这是我正在寻找的那种情节的示例图像:
如您所料,不同的比例会使小提琴在条形图中变得非常小,除非您对其进行调整。在您的情况下,您还不小心在通道编码中大写了 Density:Q
,这意味着您的 violinplot 实际上是空的,因为该通道不存在。这个例子有效:
import altair as alt
from vega_datasets import data
df = data.cars()
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"Horsepower",
as_=["Horsepower", "density"],
).mark_area().encode(
x="Horsepower:Q",
y=alt.Y("density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
x="Horsepower",
y=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="(random() / 400) + 0.0052" # Narrowing and centering the points
)
# 3. Combine both
violin + stripplot
通过使用scipy,您还可以将点本身布置成小提琴的形状,我个人非常喜欢(discussion in this issue):
import altair as alt
import numpy as np
import pandas as pd
from scipy import stats
from vega_datasets import data
# NAs are not supported in SciPy's density calculation
df = data.cars().dropna()
y = 'Horsepower'
# Compute the density function of the data
dens = stats.gaussian_kde(df[y])
# Compute the density value for each data point
pdf = dens(df[y].sort_values())
# Randomly jitter points within 0 and the upper bond of the probability density function
density_cloud = np.empty(pdf.shape[0])
for i in range(pdf.shape[0]):
density_cloud[i] = np.random.uniform(0, pdf[i])
# To create a symmetric density/violin, we make every second point negative
# Distributing every other point like this is also more likely to preserve the shape of the violin
violin_cloud = density_cloud.copy()
violin_cloud[::2] = violin_cloud[::2] * -1
# Append the density cloud to the original data in the correctly sorted order
df_with_density = pd.concat([
df,
pd.DataFrame({
'density_cloud': density_cloud,
'violin_cloud': violin_cloud
},
index=df['Horsepower'].sort_values().index)],
axis=1
)
# Visualize using the new Offset channel
alt.Chart(df_with_density).mark_circle().encode(
x='Horsepower',
y='violin_cloud'
)
当添加了对 x/y 偏移通道的支持时,这两种方法都可以在下一个版本的 Altair 中使用多个分类而无需分面。
我正在尝试创建一个包含小提琴图和带有抖动的条带图的图。我该怎么做呢?我在下面提供了我的尝试。我一直遇到的问题是小提琴情节在情节中似乎是看不见的。
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"n_genes_by_counts",
as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
y="n_genes_by_counts:Q",
x=alt.X("Density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
y="n_gene_by_counts",
x=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)
# 3. Combine both
combined = stripplot + violin
我感觉这可能是 X 轴缩放的问题。也就是说,density
比 jitter
小得多。如果是这样,我如何使 jitter
与 density
处于同一数量级?有人可以告诉我如何在给定属于某个 pandas 数据框 df
的列名 n_gene_by_counts
的情况下创建小提琴+条带图吗?这是我正在寻找的那种情节的示例图像:
如您所料,不同的比例会使小提琴在条形图中变得非常小,除非您对其进行调整。在您的情况下,您还不小心在通道编码中大写了 Density:Q
,这意味着您的 violinplot 实际上是空的,因为该通道不存在。这个例子有效:
import altair as alt
from vega_datasets import data
df = data.cars()
# 1. Create violin plot
violin = alt.Chart(df).transform_density(
"Horsepower",
as_=["Horsepower", "density"],
).mark_area().encode(
x="Horsepower:Q",
y=alt.Y("density:Q", stack="center", title=None),
)
# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
x="Horsepower",
y=alt.X("jitter:Q", title=None),
).transform_calculate(
jitter="(random() / 400) + 0.0052" # Narrowing and centering the points
)
# 3. Combine both
violin + stripplot
通过使用scipy,您还可以将点本身布置成小提琴的形状,我个人非常喜欢(discussion in this issue):
import altair as alt
import numpy as np
import pandas as pd
from scipy import stats
from vega_datasets import data
# NAs are not supported in SciPy's density calculation
df = data.cars().dropna()
y = 'Horsepower'
# Compute the density function of the data
dens = stats.gaussian_kde(df[y])
# Compute the density value for each data point
pdf = dens(df[y].sort_values())
# Randomly jitter points within 0 and the upper bond of the probability density function
density_cloud = np.empty(pdf.shape[0])
for i in range(pdf.shape[0]):
density_cloud[i] = np.random.uniform(0, pdf[i])
# To create a symmetric density/violin, we make every second point negative
# Distributing every other point like this is also more likely to preserve the shape of the violin
violin_cloud = density_cloud.copy()
violin_cloud[::2] = violin_cloud[::2] * -1
# Append the density cloud to the original data in the correctly sorted order
df_with_density = pd.concat([
df,
pd.DataFrame({
'density_cloud': density_cloud,
'violin_cloud': violin_cloud
},
index=df['Horsepower'].sort_values().index)],
axis=1
)
# Visualize using the new Offset channel
alt.Chart(df_with_density).mark_circle().encode(
x='Horsepower',
y='violin_cloud'
)
当添加了对 x/y 偏移通道的支持时,这两种方法都可以在下一个版本的 Altair 中使用多个分类而无需分面。