Plotly express choropleth 垃圾箱
Plotly express choropleth bins
我正在使用 plotly express 创建等值线图。我想使用这些箱子有一个离散的色标:[−50,0) , [0,50), [50,100), [100,150), [150,200), [200,250), [250,300), [300,350), [ 350,400)、[400,450) 和 [450,500)。下面是我的数据框的一个片段。
Country 1990 2019 Code PGR
0 Afghanistan 12.4 38.0 AFG 206.451613
1 Albania 3.3 2.9 ALB -12.121212
2 Algeria 25.8 43.1 DZA 67.054264
我可以显示绘图,但不知道如何设置颜色。到目前为止我的代码是:
fig = px.choropleth(popCodes,locations= popCodes["Code"],
color = popCodes["PGR"],
range_color = (-50, 500),
hover_name = popCodes["Country"],
color_discrete_sequence=px.colors.sequential.Plasma)
fig.show()
- kaggle 上似乎有一个非常相似的数据集。采购了它。不包括 PGR,因此将其计算为年度列的总和
- 如果你想要离散的垃圾箱,最简单的方法是使用 https://pandas.pydata.org/docs/reference/api/pandas.cut.html
- pandas plotly不支持区间,所以转成字符串
.astype(str)
- 鉴于您正在使用 plotly express,按名称引用列比总是传递一个系列更简单
- 完整代码如下
import kaggle.cli
import sys, requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import urllib
import plotly.express as px
# fmt: off
# download data set
url = "https://www.kaggle.com/mohaiminul101/population-growth-annual"
sys.argv = [sys.argv[0]] + f"datasets download {urllib.parse.urlparse(url).path[1:]}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f'{urllib.parse.urlparse(url).path.split("/")[-1]}.zip')
dfs = {f.filename: pd.read_csv(zfile.open(f)) for f in zfile.infolist()}
# fmt: on
popCodes = dfs["world_population_growth.csv"]
popCodes["PGR"] = popCodes.select_dtypes("number").sum(axis=1)
popCodes = popCodes.sort_values("PGR")
px.choropleth(
popCodes,
locations="Country Code",
color=pd.cut(popCodes["PGR"], bins=range(-50, 501, 50)).astype(str),
hover_name="Country Name",
hover_data={"PGR":":.1f", "Country Code":True},
color_discrete_sequence=px.colors.sequential.Plasma,
)
不像 Rob Raymond 的回答那么整洁,但这就是我所做的。如果我是你,我会使用 Rob's
popCodes['PGR'] = popCodes['PGR'].astype(float)
# Use this to bin the data and apply discrete colour codes later
conditions = [
(popCodes['PGR'] >= -50) & (popCodes['PGR'] <0),
(popCodes['PGR'] >= 0) & (popCodes['PGR'] <50),
(popCodes['PGR'] >= 50) & (popCodes['PGR'] <100),
(popCodes['PGR'] >= 100) & (popCodes['PGR'] <150),
(popCodes['PGR'] >= 150) & (popCodes['PGR'] <200),
(popCodes['PGR'] >= 200) & (popCodes['PGR'] <250),
(popCodes['PGR'] >= 250) & (popCodes['PGR'] <300),
(popCodes['PGR'] >= 300) & (popCodes['PGR'] <350),
(popCodes['PGR'] >= 350) & (popCodes['PGR'] <400),
(popCodes['PGR'] >= 400) & (popCodes['PGR'] <450),
(popCodes['PGR'] >= 450) & (popCodes['PGR'] <500)
]
values = ['[−50,0)', '[0,50)', '[50,100)', '[100,150)', '[150,200)',
'[150,200)', '[250,300)', '[300,350)', '[350,400)', '[400,450)', '[450,500)']
popCodes['range'] = np.select(conditions, values)
popCodes['PGR'] = popCodes['PGR'].astype(str) # converting PGR to string gets plotly express to use discrete colours apparently
popCodes = popCodes.sort_values('range') # sorts the dataframe by the range so I get a better legend later
fig = px.choropleth(popCodes, locations= popCodes["Code"],
color = popCodes["range"], # PGR calculated from dataframer
hover_name = popCodes["PGR"], # column to add to hover information
color_discrete_map={"[−50,0)": "maroon", # discrete colours based on the range of PGR from above
'[0,50)':'red',
'[50,100)':'orange',
'[100,150)':'yellow',
'[150,200)':'lime',
'[200,250)':'green',
'[250,300)':'aqua',
'[300,350)':'teal',
'[350,400)':'blue',
'[400,450)':'navy',
'[450,500)':'purple'
}
)
fig.show()
我正在使用 plotly express 创建等值线图。我想使用这些箱子有一个离散的色标:[−50,0) , [0,50), [50,100), [100,150), [150,200), [200,250), [250,300), [300,350), [ 350,400)、[400,450) 和 [450,500)。下面是我的数据框的一个片段。
Country 1990 2019 Code PGR
0 Afghanistan 12.4 38.0 AFG 206.451613
1 Albania 3.3 2.9 ALB -12.121212
2 Algeria 25.8 43.1 DZA 67.054264
我可以显示绘图,但不知道如何设置颜色。到目前为止我的代码是:
fig = px.choropleth(popCodes,locations= popCodes["Code"],
color = popCodes["PGR"],
range_color = (-50, 500),
hover_name = popCodes["Country"],
color_discrete_sequence=px.colors.sequential.Plasma)
fig.show()
- kaggle 上似乎有一个非常相似的数据集。采购了它。不包括 PGR,因此将其计算为年度列的总和
- 如果你想要离散的垃圾箱,最简单的方法是使用 https://pandas.pydata.org/docs/reference/api/pandas.cut.html
- pandas plotly不支持区间,所以转成字符串
.astype(str)
- 鉴于您正在使用 plotly express,按名称引用列比总是传递一个系列更简单
- 完整代码如下
import kaggle.cli
import sys, requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import urllib
import plotly.express as px
# fmt: off
# download data set
url = "https://www.kaggle.com/mohaiminul101/population-growth-annual"
sys.argv = [sys.argv[0]] + f"datasets download {urllib.parse.urlparse(url).path[1:]}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f'{urllib.parse.urlparse(url).path.split("/")[-1]}.zip')
dfs = {f.filename: pd.read_csv(zfile.open(f)) for f in zfile.infolist()}
# fmt: on
popCodes = dfs["world_population_growth.csv"]
popCodes["PGR"] = popCodes.select_dtypes("number").sum(axis=1)
popCodes = popCodes.sort_values("PGR")
px.choropleth(
popCodes,
locations="Country Code",
color=pd.cut(popCodes["PGR"], bins=range(-50, 501, 50)).astype(str),
hover_name="Country Name",
hover_data={"PGR":":.1f", "Country Code":True},
color_discrete_sequence=px.colors.sequential.Plasma,
)
不像 Rob Raymond 的回答那么整洁,但这就是我所做的。如果我是你,我会使用 Rob's
popCodes['PGR'] = popCodes['PGR'].astype(float)
# Use this to bin the data and apply discrete colour codes later
conditions = [
(popCodes['PGR'] >= -50) & (popCodes['PGR'] <0),
(popCodes['PGR'] >= 0) & (popCodes['PGR'] <50),
(popCodes['PGR'] >= 50) & (popCodes['PGR'] <100),
(popCodes['PGR'] >= 100) & (popCodes['PGR'] <150),
(popCodes['PGR'] >= 150) & (popCodes['PGR'] <200),
(popCodes['PGR'] >= 200) & (popCodes['PGR'] <250),
(popCodes['PGR'] >= 250) & (popCodes['PGR'] <300),
(popCodes['PGR'] >= 300) & (popCodes['PGR'] <350),
(popCodes['PGR'] >= 350) & (popCodes['PGR'] <400),
(popCodes['PGR'] >= 400) & (popCodes['PGR'] <450),
(popCodes['PGR'] >= 450) & (popCodes['PGR'] <500)
]
values = ['[−50,0)', '[0,50)', '[50,100)', '[100,150)', '[150,200)',
'[150,200)', '[250,300)', '[300,350)', '[350,400)', '[400,450)', '[450,500)']
popCodes['range'] = np.select(conditions, values)
popCodes['PGR'] = popCodes['PGR'].astype(str) # converting PGR to string gets plotly express to use discrete colours apparently
popCodes = popCodes.sort_values('range') # sorts the dataframe by the range so I get a better legend later
fig = px.choropleth(popCodes, locations= popCodes["Code"],
color = popCodes["range"], # PGR calculated from dataframer
hover_name = popCodes["PGR"], # column to add to hover information
color_discrete_map={"[−50,0)": "maroon", # discrete colours based on the range of PGR from above
'[0,50)':'red',
'[50,100)':'orange',
'[100,150)':'yellow',
'[150,200)':'lime',
'[200,250)':'green',
'[250,300)':'aqua',
'[300,350)':'teal',
'[350,400)':'blue',
'[400,450)':'navy',
'[450,500)':'purple'
}
)
fig.show()