Mapbox 中的多个不透明度 - Plotly for Python

Multiple opacities in Mapbox - Plotly for Python

我目前正在从事数据可视化项目。

我想绘制多条线(大约 200k),代表从一个地铁站到所有其他地铁站的行程。也就是说,所有的地铁站应该是一条直线连接起来的。

线条的颜色并不重要(可能是红色、蓝色等),但不透明度才是最重要的。两个随机站点之间的行程次数越多,该特定线路的不透明度就越大;反之亦然。

我觉得我接近期望的输出,但想不出正确的方法。

我使用的DataFrame (df = pd.read_csv(...))由一系列列组成,即:id_start_stationid_end_stationlat_start_stationlong_start_stationlat_end_station, long_end_station, number_of_journeys.

我得通过编码提取坐标

lons = []
lons = np.empty(3 * len(df))
lons[::3] = df['long_start_station']
lons[1::3] = df['long_end_station']
lons[2::3] = None

lats = []
lats = np.empty(3 * len(df))
lats[::3] = df['lat_start_station']
lats[1::3] = df['lat_end_station']
lats[2::3] = None

然后我开始了一个数字:

fig = go.Figure()

然后通过以下方式添加跟踪:

fig.add_trace(go.Scattermapbox(
        name='Journeys',
        lat=lats,
        lon=lons,
        mode='lines',
        line=dict(color='red', width=1),
        opacity= ¿?, # PROBLEM IS HERE [1]
    ))

[1] 所以我尝试了一些不同的方法来传递不透明度项:

  1. 我为每条轨迹的不透明度创建了一个新元组,方法是:
opacity = []
opacity  = np.empty(3 * len(df))
opacity [::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [1::3] = df['number_of_journeys'] / max(df['number_of_journeys'])
opacity [2::3] = None

并传入[1],但出现这个错误:

ValueError: 
    Invalid value of type 'numpy.ndarray' received for the 'opacity' property of scattermapbox

    The 'opacity' property is a number and may be specified as:
      - An int or float in the interval [0, 1]
  1. 然后我想通过使用 rgba 的 属性 alpha 将“不透明度”项传递给“颜色”项,例如:rgba(255,0,0,0.5) .

所以我首先创建了所有 alpha 参数的“映射”:

df['alpha'] = df['number_of_journeys'] / max(df['number_of_journeys'])

然后创建一个函数来检索特定颜色内的所有 alpha 参数:

colors_with_opacity = []

def colors_with_opacity_func(df, empty_list):
    for alpha in df['alpha']:
      empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
      empty_list.extend(["rgba(255,0,0,"+str(alpha)+")"])
      empty_list.append(None)
      

colors_with_opacity_func(df, colors_with_opacity)

并将其传递到散点图框的颜色属性中,但出现以下错误:

ValueError:
    Invalid value of type 'builtins.list' received for the 'color' property of scattermapbox.line

    The 'color' property is a color and may be specified as:
      - A hex string (e.g. '#ff0000')
      - An rgb/rgba string (e.g. 'rgb(255,0,0)')
      - An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
      - An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
      - A named CSS color:
            aliceblue, antiquewhite, aqua, [...] , whitesmoke,
            yellow, yellowgreen

由于是大量的行,循环/迭代跟踪会产生性能问题。

任何帮助将不胜感激。我想不出一种方法来正确地完成它。

提前谢谢你。

编辑 1:添加了新问题

我在下面添加这个问题,因为我相信它可以帮助其他正在寻找这个特定主题的人。

根据 Rob 的有用回答,我设法添加了多个不透明度,如前所述。

但是,我的一些同事建议进行更改以改进地图的可视化。

现在,我希望 有多个宽度(根据相同的值,而不是多个不透明度(每个跟踪一个)数据框)。

根据 Rob 的回答,我需要这样的东西:

BINS_FOR_OPACITY=10
opacity_a = np.geomspace(0.001,1, BINS_FOR_OPACITY)
BINS_FOR_WIDTH=10
width_a = np.geomspace(1,3, BINS_FOR_WIDTH)

fig = go.Figure()

# Note the double "for" statement that follows

for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_OPACITY, labels=opacity_a)):
    for width, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS_FOR_WIDTH, labels=width_a)):
        fig.add_traces(
            go.Scattermapbox(
                name=f"{d['number_of_journeys'].mean():.2E}",
                lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
                lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
                line_width=width
                line_color="blue",
                opacity=opacity,
                mode="lines+markers",
        )
    )

然而,以上显然不起作用,因为它产生的痕迹比它应该做的多得多(我真的无法解释为什么,但我想这可能是因为两个 for 语句)。

我突然想到 pd.cut 部分可能隐藏了某种解决方案,因为我需要 类似 的双切,但不能找到正确的方法。

我还通过以下方式创建了一个 Pandas 系列:

widths = pd.cut(df.["size"], bins=BINS_FOR_WIDTH, labels=width_a)

并迭代该系列,但得到了与以前相同的结果(痕迹过多)。

为了强调和澄清我自己,我不需要只有多个不透明度或多个宽度,但我需要两者[=106] =] 和 同时 ,这让我有些麻烦。

再次感谢任何帮助。

  • opacity 是每条轨迹,对于标记,可以使用 rgba(a,b,c,d) 用颜色来完成,但不能用于线条。 (在直线散点图中相同)
  • 为了演示,我使用了伦敦地铁站(经过过滤以减少节点数量)。加上将数据格式化为 CSV 的额外努力。 JSON 因为来源与解决方案无关
  • 编码到 bin number_of_journeys 以包含到具有用于计算和不透明度的几何级数的轨迹中
  • 此样本数据集正在生成 83k 样本行
import requests
import geopandas as gpd
import plotly.graph_objects as go
import itertools
import numpy as np
import pandas as pd
from pathlib import Path

# get geometry of london underground stations
gdf = gpd.GeoDataFrame.from_features(
    requests.get(
        "https://raw.githubusercontent.com/oobrien/vis/master/tube/data/tfl_stations.json"
    ).json()
)

# limit to zone 1 and stations that have larger number of lines going through them
gdf = gdf.loc[gdf["zone"].isin(["1","2","3","4","5","6"]) & gdf["lines"].apply(len).gt(0)].reset_index(
    drop=True
).rename(columns={"id":"tfl_id", "name":"id"})

# wanna join all valid combinations of stations...
combis = np.array(list(itertools.combinations(gdf.index, 2)))

# generate dataframe of all combinations of stations
gdf_c = (
    gdf.loc[combis[:, 0], ["geometry", "id"]]
    .assign(right=combis[:, 1])
    .merge(gdf.loc[:, ["geometry", "id"]], left_on="right", right_index=True, suffixes=("_start_station","_end_station"))
)


gdf_c["lat_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.y)
gdf_c["long_start_station"] = gdf_c["geometry_start_station"].apply(lambda g: g.x)
gdf_c["lat_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.y)
gdf_c["long_end_station"] = gdf_c["geometry_end_station"].apply(lambda g: g.x)

gdf_c = gdf_c.drop(
    columns=[
        "geometry_start_station",
        "right",
        "geometry_end_station",
    ]
).assign(number_of_journeys=np.random.randint(1,10**5,len(gdf_c)))

gdf_c
f = Path.cwd().joinpath("SO.csv")
gdf_c.to_csv(f, index=False)

# there's an requirement to start with a CSV even though no sample data has been provided, now we're starting with a CSV
df = pd.read_csv(f)

# makes use of ravel simpler...
df["none"] = None

# now it's simple to generate scattermapbox... a trace per required opacity
BINS=10
opacity_a = np.geomspace(0.001,1, BINS)
fig = go.Figure()
for opacity, d in df.groupby(pd.cut(df["number_of_journeys"], bins=BINS, labels=opacity_a)):
    fig.add_traces(
        go.Scattermapbox(
            name=f"{d['number_of_journeys'].mean():.2E}",
            lat=np.ravel(d.loc[:,[c for c in df.columns if "lat" in c or c=="none"]].values),
            lon=np.ravel(d.loc[:,[c for c in df.columns if "long" in c or c=="none"]].values),
            line_color="blue",
            opacity=opacity,
            mode="lines+markers",
        )
    )

fig.update_layout(
    mapbox={
        "style": "carto-positron",
        "center": {'lat': 51.520214996769255, 'lon': -0.097792388774743},
        "zoom": 9,
    },
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)