在 R Plotly 中叠加两个直方图
Overlaying two histograms in R Plotly
我试图在 R 中绘制两个直方图。然而只有其中一个出现。这是我使用一些随机数据的代码:
myDF <- cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~Income, yaxis = "y1") %>%
add_histogram(x = ~AgeInTwoYearIncrements, yaxis = "y2") %>%
layout(
title = "Salary vs Age",
yaxis = list(
tickfont = list(color = "blue"),
overlaying = "y",
side = "left",
title = "Income"
),
yaxis2 = list(
tickfont = list(color = "red"),
overlaying = "y",
side = "right",
title = "Age"
),
xaxis = list(title = "count")
)
如有任何帮助,我们将不胜感激!
给第1个yaxis是主要原因overlaying
。因为 xaxis
是 count
,Income
而 Age
是 y
.
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(y = ~Income, yaxis = "y1") %>% # not `x =`
add_histogram(y = ~AgeInTwoYearIncrements, yaxis = "y2") %>%
layout(
title = "Salary vs Age",
yaxis = list(
tickfont = list(color = "blue"),
# overlaying = "y", # the main cause is this line.
side = "left",
title = "Income"
),
yaxis2 = list(
tickfont = list(color = "red"),
overlaying = "y",
side = "right",
title = "Age"
),
xaxis = list(title = "count")
)
[编辑:只需翻转]
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~ Income, xaxis = "x1") %>%
add_histogram(x = ~ AgeInTwoYearIncrements, xaxis = "x2") %>%
layout(
margin = list(t = 60),
title = "Salary vs Age",
xaxis = list(
tickfont = list(color = "blue"),
side = "left",
title = "Income"
),
xaxis2 = list(
tickfont = list(color = "red"),
overlaying = "x",
side = "top",
position = 0.95,
title = "<br>Age"
),
yaxis = list(title = "count")
)
您可以混合直方图:
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~Income) %>%
add_histogram(x = ~AgeInTwoYearIncrements) %>%
layout(
title = "Salary and Age",
yaxis = list(
tickfont = list(color = "blue"),
overlaying = "y",
side = "left",
title = "count"
),
xaxis = list(title = "Salary and Age value")
)
直方图通常在 y 轴上显示频率/计数,而不是在 x 轴上。我们可以生成您想要的图表,但我不确定它是否仍然是直方图。
此外,就像您在我的照片中看到的那样,frequency/count 的薪水(此处为蓝色)更高,而且变异性小于年龄。这使得很难获得好看的图表。也许这只是你的示例数据的问题...
所以当你喜欢使用直方图函数时,你必须反转频率的含义和 x 轴上的值。
但无论如何,我认为散点图是显示工资与年龄之间关系的更好解决方案。
编辑:
这是我 运行 你的代码时得到的结果:
像这样,我看不出剧情的意义和你想要的。第一个橙色列的含义是 59 岁在您的数据集中出现了 0 到 5 次。第三列表示 88 岁在您的数据集中出现了 10 到 15 次。
在条形图中显示此信息是行不通的。因为您可以在计数类别中有多个年龄值...我希望这是清楚的。
无论如何,为了回答您的问题,我需要更多说明。
根据回复 here,我想用一个其他人可以轻松使用的示例来回答这个问题,例如绘制两个重叠的直方图。
# Add required packages
library(plotly)
# Make some sample data
a = rnorm(1000,4)
b = rnorm(1000,6)
# Make your histogram plot with binsize set automatically
fig <- plot_ly(alpha = 0.6) # don't need "nbinsx = 30"
fig <- fig %>% add_histogram(a, name = "first")
fig <- fig %>% add_histogram(b, name = "second")
fig <- fig %>% layout(barmode = "overlay",
yaxis = list(title = "Frequency"),
xaxis = list(title = "Values"))
# Print your histogram
fig
这是代码的结果:
无需重复即可轻松处理任意数量的维度
TL;DR:您可以将数据重新排列为 long-form,然后再将其传递给 plot_ly()
。
df |>
mutate(row_number = row_number()) |>
pivot_longer(!row_number) |>
plot_ly() |>
add_histogram(x = ~ value,
color = ~ name,
opacity = 0.5) |>
layout(barmode = 'overlay')
说明
给定一个包含多列的 DF,就像 OP 发布的那样:
df = cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))
然后,使用 tidyr::pivot_longer()
:
df |> mutate(row_number = row_number()) |> pivot_longer(!row_number)
这给出:
# A tibble: 2,000 × 3
row_number name value
<int> <chr> <dbl>
1 1 Income 1
2 1 AgeInTwoYearIncrements 20
3 2 Income 1
4 2 AgeInTwoYearIncrements 48
5 3 Income 3
6 3 AgeInTwoYearIncrements 26
7 4 Income 4
8 4 AgeInTwoYearIncrements 30
9 5 Income 4
10 5 AgeInTwoYearIncrements 60
# … with 1,990 more rows
最后,将其通过管道传输到 plot_ly()
,所以完整的命令是:
df |>
# Add a column to keep track of the row numbers
mutate(row_number = row_number()) |>
# Squash and lengthen the df with one row per row per column (in this case, double its length)
pivot_longer(!row_number) |>
plot_ly() |>
# The magic is here. We set color to track the name variable, which will
# add a separate series per column.
# We set the opacity so we can see where our plots overlap.
add_histogram(x = ~ value,
color = ~ name,
opacity = 0.5) |>
# Without setting this, bars will be plotted side by side for the same x value
# rather than overlapping.
layout(barmode = 'overlay')
输出
我试图在 R 中绘制两个直方图。然而只有其中一个出现。这是我使用一些随机数据的代码:
myDF <- cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~Income, yaxis = "y1") %>%
add_histogram(x = ~AgeInTwoYearIncrements, yaxis = "y2") %>%
layout(
title = "Salary vs Age",
yaxis = list(
tickfont = list(color = "blue"),
overlaying = "y",
side = "left",
title = "Income"
),
yaxis2 = list(
tickfont = list(color = "red"),
overlaying = "y",
side = "right",
title = "Age"
),
xaxis = list(title = "count")
)
如有任何帮助,我们将不胜感激!
给第1个yaxis是主要原因overlaying
。因为 xaxis
是 count
,Income
而 Age
是 y
.
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(y = ~Income, yaxis = "y1") %>% # not `x =`
add_histogram(y = ~AgeInTwoYearIncrements, yaxis = "y2") %>%
layout(
title = "Salary vs Age",
yaxis = list(
tickfont = list(color = "blue"),
# overlaying = "y", # the main cause is this line.
side = "left",
title = "Income"
),
yaxis2 = list(
tickfont = list(color = "red"),
overlaying = "y",
side = "right",
title = "Age"
),
xaxis = list(title = "count")
)
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~ Income, xaxis = "x1") %>%
add_histogram(x = ~ AgeInTwoYearIncrements, xaxis = "x2") %>%
layout(
margin = list(t = 60),
title = "Salary vs Age",
xaxis = list(
tickfont = list(color = "blue"),
side = "left",
title = "Income"
),
xaxis2 = list(
tickfont = list(color = "red"),
overlaying = "x",
side = "top",
position = 0.95,
title = "<br>Age"
),
yaxis = list(title = "count")
)
您可以混合直方图:
plot_ly(data = myDF, alpha = 0.6) %>%
add_histogram(x = ~Income) %>%
add_histogram(x = ~AgeInTwoYearIncrements) %>%
layout(
title = "Salary and Age",
yaxis = list(
tickfont = list(color = "blue"),
overlaying = "y",
side = "left",
title = "count"
),
xaxis = list(title = "Salary and Age value")
)
直方图通常在 y 轴上显示频率/计数,而不是在 x 轴上。我们可以生成您想要的图表,但我不确定它是否仍然是直方图。
此外,就像您在我的照片中看到的那样,frequency/count 的薪水(此处为蓝色)更高,而且变异性小于年龄。这使得很难获得好看的图表。也许这只是你的示例数据的问题...
所以当你喜欢使用直方图函数时,你必须反转频率的含义和 x 轴上的值。
但无论如何,我认为散点图是显示工资与年龄之间关系的更好解决方案。
编辑:
这是我 运行 你的代码时得到的结果:
像这样,我看不出剧情的意义和你想要的。第一个橙色列的含义是 59 岁在您的数据集中出现了 0 到 5 次。第三列表示 88 岁在您的数据集中出现了 10 到 15 次。 在条形图中显示此信息是行不通的。因为您可以在计数类别中有多个年龄值...我希望这是清楚的。
无论如何,为了回答您的问题,我需要更多说明。
根据回复 here,我想用一个其他人可以轻松使用的示例来回答这个问题,例如绘制两个重叠的直方图。
# Add required packages
library(plotly)
# Make some sample data
a = rnorm(1000,4)
b = rnorm(1000,6)
# Make your histogram plot with binsize set automatically
fig <- plot_ly(alpha = 0.6) # don't need "nbinsx = 30"
fig <- fig %>% add_histogram(a, name = "first")
fig <- fig %>% add_histogram(b, name = "second")
fig <- fig %>% layout(barmode = "overlay",
yaxis = list(title = "Frequency"),
xaxis = list(title = "Values"))
# Print your histogram
fig
这是代码的结果:
无需重复即可轻松处理任意数量的维度
TL;DR:您可以将数据重新排列为 long-form,然后再将其传递给 plot_ly()
。
df |>
mutate(row_number = row_number()) |>
pivot_longer(!row_number) |>
plot_ly() |>
add_histogram(x = ~ value,
color = ~ name,
opacity = 0.5) |>
layout(barmode = 'overlay')
说明
给定一个包含多列的 DF,就像 OP 发布的那样:
df = cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))
然后,使用 tidyr::pivot_longer()
:
df |> mutate(row_number = row_number()) |> pivot_longer(!row_number)
这给出:
# A tibble: 2,000 × 3
row_number name value
<int> <chr> <dbl>
1 1 Income 1
2 1 AgeInTwoYearIncrements 20
3 2 Income 1
4 2 AgeInTwoYearIncrements 48
5 3 Income 3
6 3 AgeInTwoYearIncrements 26
7 4 Income 4
8 4 AgeInTwoYearIncrements 30
9 5 Income 4
10 5 AgeInTwoYearIncrements 60
# … with 1,990 more rows
最后,将其通过管道传输到 plot_ly()
,所以完整的命令是:
df |>
# Add a column to keep track of the row numbers
mutate(row_number = row_number()) |>
# Squash and lengthen the df with one row per row per column (in this case, double its length)
pivot_longer(!row_number) |>
plot_ly() |>
# The magic is here. We set color to track the name variable, which will
# add a separate series per column.
# We set the opacity so we can see where our plots overlap.
add_histogram(x = ~ value,
color = ~ name,
opacity = 0.5) |>
# Without setting this, bars will be plotted side by side for the same x value
# rather than overlapping.
layout(barmode = 'overlay')