具有多个类别的散点图,因此点不重叠

scatter plot with multiple category so the points don't overlap

我正在尝试按类别绘制两组数据,或者至少对 X 轴和 Y 轴网格点使用字符串值。我看过一些像 here 这样的例子,但它使用的是条形图而不是散点图,我还没有弄清楚如何让它发挥作用。我希望能够根据轨迹或与每个点关联的数据向点添加正或负偏移。因此,例如,如果 Up 点向上移动到网格线上方,而 Down 点移动到网格线下方,那将是理想的。现在你可以看到他们一圈

    import plotly.graph_objs as go
    import pandas as pd
    
    
    data = {}
    
    data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
    data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
    data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
    data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
    
    #copy data to dataframe
    tempDF = pd.DataFrame(columns=list(data.keys()))
    for tempKey in list(data.keys()):
        tempDF[tempKey] = data[tempKey]
    
    tempDF['markers'] = len(tempDF)*[5]
    tempDF['markers'][tempDF['Direction'] == 'Down'] = len(tempDF['markers'][tempDF['Direction'] == 'Down'])*[6]
    
    tempDF['colors'] = len(tempDF)*['red']
    tempDF['colors'][tempDF['Direction'] == 'Down'] = len(tempDF['colors'][tempDF['Direction'] == 'Down'])*['blue']
    
    fig = go.Figure()
    
    for direction in ['Up', 'Down']:
        fig.add_trace(
            go.Scatter(
                mode='markers',
                x=tempDF['Tx'][tempDF['Direction'] == direction],
                y=tempDF['Rx'][tempDF['Direction'] == direction],
                # x=tempDF['Tx'],
                # y=tempDF['Rx'],
                marker_size=15,
                marker_symbol=tempDF['markers'][tempDF['Direction'] == direction],  # Triangle-up or down
                marker=dict(
                    color=tempDF['colors'][tempDF['Direction'] == direction],
                    size=20,
                    line=dict(
                        color='MediumPurple',
                        width=2
                    )
                ),
                name=direction,
                hovertemplate="%{y} <- %{x}<br>count: 5/10<br> Pct: 10 <br>Dir %{name}<extra></extra>"
    
            )
        )
    
    #set axis order
    fig.update_layout(xaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E']},
                      yaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E'][::-1]}
    
                      )
    fig.show()

编辑: 正如 J_H 所建议的那样,我能够将类别映射到数值,然后为我的值添加一个偏移量以将它们向上或向下移动。我在图形布局中对 xaxis 字典的 tickvalsticktext 属性进行了此操作。但是,将鼠标悬停在绘图上的点上时,这样做会导致数据出现另一个问题。如果这些点恰好落在轴值上(在我的示例中 'A' 或 'B' 等 x 轴上),则该点将读作 'A' 或 'B',但如果它与数值相抵消,那么它将显示数字而不是字符串。为了纠正这个问题,我需要在图形属性中使用 customdatahovertemplate 将原始值设置回我想要的值。这是我为显示这些更改而更新的代码和图表。

import plotly.graph_objs as go
import pandas as pd
import numpy as np


data = {}
possibleCategories = ['A', 'B', 'C', 'D', 'E']
numericalValues = [1, 2, 3, 4, 5]
offset = .1
data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E']
data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D']
data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up']
data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8]
data['yValue'] = len(data['Tx'])*[-1]  # pre allocate numerical value arrays
data['xValue'] = len(data['Tx'])*[-1]
data['markers'] = len(data['Tx'])*[5]  # default marker value to be an up arrow
data['colors'] = len(data['Tx'])*["red"]  # default color to red

for tempKey in data.keys(): data[tempKey] = np.array(data[tempKey], dtype="object")  # transform all the lists into numpy arrays

# create numerical values for the categories. The Y axis will have an offset, but not the x axis
for i in range(len(data['Tx'])):
    if data['Direction'][i] == 'Up':
        data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]+offset
    else:
        data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]-offset
    data['xValue'][i] = numericalValues[possibleCategories.index(data['Tx'][i])]

# set markers and colors
downIndexs = np.where(data['Direction'] == 'Down')
data['markers'][downIndexs] = 6
data['colors'][downIndexs] = "blue"


#copy data to dataframe
tempDF = pd.DataFrame(columns=list(data.keys()))
for tempKey in list(data.keys()):
    tempDF[tempKey] = data[tempKey]

fig = go.Figure()

for direction in ['Up', 'Down']:
    fig.add_trace(
        go.Scatter(
            mode='markers',
            x=tempDF['xValue'][tempDF['Direction'] == direction],
            y=tempDF['yValue'][tempDF['Direction'] == direction],
            # x=tempDF['Tx'],
            # y=tempDF['Rx'],
            marker_size=15,
            marker_symbol=tempDF['markers'][tempDF['Direction'] == direction],  # Triangle-up or down
            marker=dict(
                color=tempDF['colors'][tempDF['Direction'] == direction],
                size=20,
                line=dict(
                    color='MediumPurple',
                    width=2
                )
            ),
            name=direction,
            customdata=np.stack((tempDF['Rx'][tempDF['Direction'] == direction], tempDF['Tx'][tempDF['Direction'] == direction], tempDF['Metric'][tempDF['Direction'] == direction]), axis=-1),
            hovertemplate="<br>".join([
                '%{customdata[0]} <- %{customdata[1]}',
                'metric: = %{customdata[2]}',
                'Dir: ' + direction,
                '<extra></extra>'
            ])
        )
    )

#set axis order

fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickvals=numericalValues,
        ticktext=possibleCategories,
        range=[min(numericalValues)-1, max(numericalValues)+1],
        side='top'
    ),
    yaxis=dict(
        tickmode='array',
        tickvals=numericalValues,
        ticktext=possibleCategories,
        range=[max(numericalValues)+1, min(numericalValues)-1 ]
    ),
)

               )
fig.show()

我们希望避免将一个符号绘制在另一个符号之上。

if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal.

是的,您当然可以在应用程序级别自由执行此操作, 通过在将 (x, y) 值传递给 plotly 之前修改它们。 在您的示例中,这相当于将字母映射到数值, 调整它们, 并将它们传递给图书馆。


对于尚未离散化的值, 更普遍的问题是寻找碰撞, 查找 p1p2 小距离 d 内的数据点 应该扰动使距离超过d.

要以线性时间而不是二次时间执行此操作, 假设一些合理的输入分布, 足以离散化连续输入值 到所需的网格大小。 这让我们摆脱了完全相等的测试, 这比担心距离度量更容易。 将离散值存储在 set 中, 并在注意到碰撞时感到不安。 使用 min( ... ) - dmax( ... ) + d 因此,高于或低于哪一点并不重要。


如果可以使用 seaborn 库, swarmplot 或 stripplot 将是自然的方法。 也许你正在寻找这个 功能:https://plotly.com/python-api-reference/generated/plotly.express.strip.html


编辑

ord() 函数将为您将字符映射为序数值:

>>> for ch in 'ABC':
...     print(ch, ord(ch), ord(ch) - ord('A'))
... 
A 65 0
B 66 1
C 67 2