在 Altair 小提琴图上绘制中线和四分位线

Drawing median and quartile lines on an Altair violin plot

假设我有以下情节(取自 Altair 文档中的 tutorial):

import altair as alt
from vega_datasets import data

alt.Chart(data.cars()).transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).properties(
    width=100
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

我如何才能在每个小提琴图上绘制四分位数和中值线?我是否必须定义另一个图并将其分层放在小提琴图之上?如果线条与分布上特定位置的小提琴图的宽度相同,那也很好。

是的,你会在刻面之前将它们分层。需要分别添加到分层图表和多面图表的内容有点棘手,但像这样的东西会起作用:

import altair as alt
from vega_datasets import data

violins = alt.Chart().transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
)

alt.layer(
    violins,
    alt.Chart().mark_rule().encode(
        y='median(Miles_per_Gallon)',
        x=alt.X(),
        color=alt.value('black')),
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

然后你可以对四分位数做同样的事情。除了手动输入值之外,我不确定如何将线条限制为区域的宽度,而且我认为这也有点棘手。我建议改为在小提琴内放置一个箱线图:

alt.layer(
    violins,
    alt.Chart().mark_boxplot(size=5, extent=0, outliers=False).encode(
        y='Miles_per_Gallon',
        x=alt.value(46),
        color=alt.value('black')
    )
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

这类似于 how seaborn handles violinplots by default and it is also how they were described in the original paper by Hintze and Nelson in 1997