Altair:创建分层小提琴 + stripplot

Altair: Creating a layered violin + stripplot

我正在尝试创建一个包含小提琴图和带有抖动的条带图的图。我该怎么做呢?我在下面提供了我的尝试。我一直遇到的问题是小提琴情节在情节中似乎是看不见的。

# 1. Create violin plot
violin = alt.Chart(df).transform_density(
    "n_genes_by_counts",
    as_=["n_genes_by_counts", "density"],
).mark_area(orient="horizontal").encode(
    y="n_genes_by_counts:Q",
    x=alt.X("Density:Q", stack="center", title=None),
)

# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
    y="n_gene_by_counts",
    x=alt.X("jitter:Q", title=None),
).transform_calculate(
    jitter="sqrt(-2*log(random()))*cos(2*PI*random())"
)

# 3. Combine both
combined = stripplot + violin

我感觉这可能是 X 轴缩放的问题。也就是说,densityjitter 小得多。如果是这样,我如何使 jitterdensity 处于同一数量级?有人可以告诉我如何在给定属于某个 pandas 数据框 df 的列名 n_gene_by_counts 的情况下创建小提琴+条带图吗?这是我正在寻找的那种情节的示例图像:

如您所料,不同的比例会使小提琴在条形图中变得非常小,除非您对其进行调整。在您的情况下,您还不小心在通道编码中大写了 Density:Q ,这意味着您的 violinplot 实际上是空的,因为该通道不存在。这个例子有效:

import altair as alt
from vega_datasets import data

df = data.cars()

# 1. Create violin plot
violin = alt.Chart(df).transform_density(
    "Horsepower",
    as_=["Horsepower", "density"],
).mark_area().encode(
    x="Horsepower:Q",
    y=alt.Y("density:Q", stack="center", title=None),
)

# 2. Create stripplot
stripplot = alt.Chart(df).mark_circle(size=8, color="black").encode(
    x="Horsepower",
    y=alt.X("jitter:Q", title=None),
).transform_calculate(
    jitter="(random() / 400) + 0.0052"  # Narrowing and centering the points
)

# 3. Combine both
violin + stripplot

通过使用scipy,您还可以将点本身布置成小提琴的形状,我个人非常喜欢(discussion in this issue):

import altair as alt
import numpy as np
import pandas as pd
from scipy import stats
from vega_datasets import data


# NAs are not supported in SciPy's density calculation
df = data.cars().dropna()
y = 'Horsepower'

# Compute the density function of the data
dens = stats.gaussian_kde(df[y])
# Compute the density value for each data point
pdf = dens(df[y].sort_values())

# Randomly jitter points within 0 and the upper bond of the probability density function
density_cloud = np.empty(pdf.shape[0])
for i in range(pdf.shape[0]):
    density_cloud[i] = np.random.uniform(0, pdf[i])
# To create a symmetric density/violin, we make every second point negative
# Distributing every other point like this is also more likely to preserve the shape of the violin
violin_cloud = density_cloud.copy()
violin_cloud[::2] = violin_cloud[::2] * -1

# Append the density cloud to the original data in the correctly sorted order
df_with_density = pd.concat([
    df,
    pd.DataFrame({
        'density_cloud': density_cloud,
        'violin_cloud': violin_cloud
        },
        index=df['Horsepower'].sort_values().index)],
    axis=1
)

# Visualize using the new Offset channel
alt.Chart(df_with_density).mark_circle().encode(
    x='Horsepower',
    y='violin_cloud'
)

当添加了对 x/y 偏移通道的支持时,这两种方法都可以在下一个版本的 Altair 中使用多个分类而无需分面。