为什么散景箱线图没有出现?

why doesn't bokeh boxplot appear?

table 我正在研究:

https://github.com/KeithGalli/pandas/blob/master/pokemon_data.csv

当我为 'HP' 列创建一个箱形图并使用 'Generation' 作为类别时它工作正常

代码IMAGE OF BOXPLOT PRODUCED:

def box_plot(df, vals, label, ylabel=None):

    """
    Make a Bokeh box plot from a tidy DataFrame.
    
    Parameters
    ----------
    df : tidy Pandas DataFrame
        DataFrame to be used for plotting
    vals : hashable object
        Column of DataFrame containing data to be used.
    label : hashable object
        Column of DataFrame use to categorize.
    ylabel : str, default None
        Text for y-axis label
        
    Returns
    -------
    output : Bokeh plotting object
        Bokeh plotting object that can be rendered with
        bokeh.io.show()
        
    Notes
    -----
    .. Based largely on example code found here:
     https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/boxplot.py
    """
    # Get the categories
    cats = list(df[label].unique())
    
    # Group Data frame
    df_gb = df.groupby(label)

    # Compute quartiles for each group
    q1 = df_gb[vals].quantile(q=0.25)
    q2 = df_gb[vals].quantile(q=0.5)
    q3 = df_gb[vals].quantile(q=0.75)
                       
    # Compute interquartile region and upper and lower bounds for outliers
    iqr = q3 - q1
    upper_cutoff = q3 + 1.5*iqr
    lower_cutoff = q1 - 1.5*iqr

    # Find the outliers for each category
    def outliers(group):
        cat = group.name
        outlier_inds = (group[vals] > upper_cutoff[cat]) \
                                     | (group[vals] < lower_cutoff[cat])
        return group[vals][outlier_inds]

    # Apply outlier finder
    out = df_gb.apply(outliers).dropna()

    # Points of outliers for plotting
    outx = []
    outy = []
    for cat in cats:
        # only add outliers if they exist
        if not out[cat].empty:
            for value in out[cat]:
                outx.append(cat)
                outy.append(value) 
                
    # If outliers, shrink whiskers to smallest and largest non-outlier
    qmin = df_gb[vals].min()
    qmax = df_gb[vals].max()
    upper = [min([x,y]) for (x,y) in zip(qmax, upper_cutoff)]
    lower = [max([x,y]) for (x,y) in zip(qmin, lower_cutoff)]

    # Build figure
    p = figure(sizing_mode='stretch_width')
    p.ygrid.grid_line_color = 'white'
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_width = 2
    p.yaxis.axis_label = ylabel
    
    # stems
    p.segment(cats, upper, cats, q3, line_width=2, line_color="black")
    p.segment(cats, lower, cats, q1, line_width=2, line_color="black")

    # boxes
    p.rect(cats, (q3 + q1)/2, 0.5, q3 - q1, fill_color="red", 
           alpha=0.7, line_width=2, line_color="black")

    # median (almost-0 height rects simpler than segments)
    p.rect(cats, q2, 0.5, 0.01, line_color="black", line_width=2)

    # whiskers (almost-0 height rects simpler than segments)
    p.rect(cats, lower, 0.2, 0.01, line_color="black")
    p.rect(cats, upper, 0.2, 0.01, line_color="black")

    # outliers
    p.circle(outx, outy, size=6, color="black")

    return p

p = box_plot(df, 'HP', 'Generation', ylabel='HP')
show(p)

但是,如果我将最后的参数更改为:

p = box_plot(df, 'Attack', 'Generation', ylabel='HP')
show(p)

导致错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [21], in <cell line: 96>()
     92     p.circle(outx, outy, size=6, color="black")
     94     return p
---> 96 p = box_plot(df, 'Attack', 'Generation', ylabel='HP')
     97 show(p)

Input In [21], in box_plot(df, vals, label, ylabel)
     55 outy = []
     56 for cat in cats:
     57     # only add outliers if they exist
---> 58     if not out[cat].empty:
     59         for value in out[cat]:
     60             outx.append(cat)

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:958, in Series.__getitem__(self, key)
    955     return self._values[key]
    957 elif key_is_scalar:
--> 958     return self._get_value(key)
    960 if is_hashable(key):
    961     # Otherwise index.get_value will raise InvalidIndexError
    962     try:
    963         # For labels that don't resolve as scalars like tuples and frozensets

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:1069, in Series._get_value(self, label, takeable)
   1066     return self._values[label]
   1068 # Similar to Index.get_value, but we do not fall back to positional
-> 1069 loc = self.index.get_loc(label)
   1070 return self.index._get_values_for_loc(self, loc, label)

File ~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py:2871, in MultiIndex.get_loc(self, key, method)
   2868     return mask
   2870 if not isinstance(key, tuple):
-> 2871     loc = self._get_level_indexer(key, level=0)
   2872     return _maybe_to_slice(loc)
   2874 keylen = len(key)

File ~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py:3251, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3247     end = level_codes.searchsorted(idx, side="right")
   3249 if start == end:
   3250     # The label is present in self.levels[level] but unused:
-> 3251     raise KeyError(key)
   3252 return slice(start, end)

KeyError: 5

它似乎只适用于 HP 列和 Generation 类别。下面的另一个例子

如果我更改类别,比如 'Type 1',它再次执行失败。

最后的代码:

p = box_plot(df, 'HP', 'Type 1', ylabel='HP')
show(p)

导致错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [22], in <cell line: 96>()
     92     p.circle(outx, outy, size=6, color="black")
     94     return p
---> 96 p = box_plot(df, 'HP', 'Type 1', ylabel='HP')
     97 show(p)

Input In [22], in box_plot(df, vals, label, ylabel)
     55 outy = []
     56 for cat in cats:
     57     # only add outliers if they exist
---> 58     if not out[cat].empty:
     59         for value in out[cat]:
     60             outx.append(cat)

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:958, in Series.__getitem__(self, key)
    955     return self._values[key]
    957 elif key_is_scalar:
--> 958     return self._get_value(key)
    960 if is_hashable(key):
    961     # Otherwise index.get_value will raise InvalidIndexError
    962     try:
    963         # For labels that don't resolve as scalars like tuples and frozensets

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:1069, in Series._get_value(self, label, takeable)
   1066     return self._values[label]
   1068 # Similar to Index.get_value, but we do not fall back to positional
-> 1069 loc = self.index.get_loc(label)
   1070 return self.index._get_values_for_loc(self, loc, label)

File ~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py:2871, in MultiIndex.get_loc(self, key, method)
   2868     return mask
   2870 if not isinstance(key, tuple):
-> 2871     loc = self._get_level_indexer(key, level=0)
   2872     return _maybe_to_slice(loc)
   2874 keylen = len(key)

File ~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py:3251, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3247     end = level_codes.searchsorted(idx, side="right")
   3249 if start == end:
   3250     # The label is present in self.levels[level] but unused:
-> 3251     raise KeyError(key)
   3252 return slice(start, end)

KeyError: 'Poison'

有什么指导可以帮助我调整代码以确保它适用于所有可能的组合吗?

要使您的代码 运行,您必须应用一些小的更改:

# old
# # Get the categories
# cats = list(df[label].unique())

# # Group Data frame
# df_gb = df.groupby(label)

# Group Data frame
df_gb = df.groupby(label)
# Get the categories
cats = list(df_gb.groups.keys())

此处选择的类别有误。同时更改此行:

# old
# if out[cat].empty:
if cat in out and not out[cat].empty: #new

这是因为并非所有类别都有异常值,您必须以某种方式跳过它们。这是您的代码抛出错误的地方。

如果您 运行 box_plot(df, 'HP', 'Type 1', ylabel='HP'),现在您的代码显示一个空白数字。这是因为索引是字符串而不是数字,bokeh 不知道将框放在哪里。

要使此字符串对字符串有效,请在创建 figure().

时将字符串应用于图形 x_range
# old
# # Build figure
# p = figure(sizing_mode='stretch_width')

cats = [str(i) for i in cats]
# Build figure
p = figure(sizing_mode='stretch_width', x_range=cats)

这里所有项目都转换为字符串并首先添加到散景图中。所以图中可以适当的加上方框

这一切都完成了,调用p = box_plot(df, 'HP', 'Type 1', ylabel='HP')得到这个图: