在 R Plotly 中叠加两个直方图

Overlaying two histograms in R Plotly

我试图在 R 中绘制两个直方图。然而只有其中一个出现。这是我使用一些随机数据的代码:

    myDF <- cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
                           AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))


plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~Income, yaxis = "y1") %>% 
  add_histogram(x = ~AgeInTwoYearIncrements, yaxis = "y2") %>% 
  layout(
    title = "Salary vs Age",
    yaxis = list(
      tickfont = list(color = "blue"),
      overlaying = "y",
      side = "left",
      title = "Income"
    ),
    yaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "y",
      side = "right",
      title = "Age"
    ),
    xaxis = list(title = "count")
  )

如有任何帮助,我们将不胜感激!

给第1个yaxis是主要原因overlaying。因为 xaxiscountIncomeAgey.

plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(y = ~Income, yaxis = "y1") %>%    # not `x =`
  add_histogram(y = ~AgeInTwoYearIncrements, yaxis = "y2") %>% 
  layout(
    title = "Salary vs Age",
    yaxis = list(
      tickfont = list(color = "blue"),
      # overlaying = "y",     # the main cause is this line.
      side = "left",
      title = "Income"
    ),
    yaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "y",
      side = "right",
      title = "Age"
    ),
    xaxis = list(title = "count")
  )

[编辑:只需翻转]
plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~ Income, xaxis = "x1") %>% 
  add_histogram(x = ~ AgeInTwoYearIncrements, xaxis = "x2") %>% 
  layout(
    margin = list(t = 60),
    title = "Salary vs Age",
    xaxis = list(
      tickfont = list(color = "blue"),
      side = "left",
      title = "Income"
    ),
    xaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "x",
      side = "top",
      position = 0.95,
      title = "<br>Age"
    ),
    yaxis = list(title = "count")
  )

您可以混合直方图:

plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~Income) %>%
  add_histogram(x = ~AgeInTwoYearIncrements) %>%
layout(
  title = "Salary and Age",
  yaxis = list(
    tickfont = list(color = "blue"),
    overlaying = "y",
    side = "left",
    title = "count"
  ),
  xaxis = list(title = "Salary and Age value")
)

直方图通常在 y 轴上显示频率/计数,而不是在 x 轴上。我们可以生成您想要的图表,但我不确定它是否仍然是直方图。

此外,就像您在我的照片中看到的那样,frequency/count 的薪水(此处为蓝色)更高,而且变异性小于年龄。这使得很难获得好看的图表。也许这只是你的示例数据的问题...

所以当你喜欢使用直方图函数时,你必须反转频率的含义和 x 轴上的值。

但无论如何,我认为散点图是显示工资与年龄之间关系的更好解决方案。

编辑:

这是我 运行 你的代码时得到的结果:

像这样,我看不出剧情的意义和你想要的。第一个橙色列的含义是 59 岁在您的数据集中出现了 0 到 5 次。第三列表示 88 岁在您的数据集中出现了 10 到 15 次。 在条形图中显示此信息是行不通的。因为您可以在计数类别中有多个年龄值...我希望这是清楚的。

无论如何,为了回答您的问题,我需要更多说明。

根据回复 here,我想用一个其他人可以轻松使用的示例来回答这个问题,例如绘制两个重叠的直方图。

# Add required packages
library(plotly)    

# Make some sample data
a = rnorm(1000,4)
b = rnorm(1000,6)

# Make your histogram plot with binsize set automatically 
fig <- plot_ly(alpha = 0.6) # don't need "nbinsx = 30" 
fig <- fig %>% add_histogram(a, name = "first")
fig <- fig %>% add_histogram(b, name = "second")
fig <- fig %>% layout(barmode = "overlay", 
                      yaxis = list(title = "Frequency"),
                      xaxis = list(title = "Values"))

# Print your histogram 
fig

这是代码的结果:

无需重复即可轻松处理任意数量的维度

TL;DR:您可以将数据重新排列为 long-form,然后再将其传递给 plot_ly()

df |>
  mutate(row_number = row_number()) |>
  pivot_longer(!row_number) |>
  plot_ly() |>
  add_histogram(x = ~ value,
                color = ~ name,
                opacity = 0.5) |>
  layout(barmode = 'overlay')

说明

给定一个包含多列的 DF,就像 OP 发布的那样:

df = cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
                      AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))

然后,使用 tidyr::pivot_longer():

df |> mutate(row_number = row_number()) |> pivot_longer(!row_number)

这给出:

# A tibble: 2,000 × 3
   row_number name                   value
        <int> <chr>                  <dbl>
 1          1 Income                     1
 2          1 AgeInTwoYearIncrements    20
 3          2 Income                     1
 4          2 AgeInTwoYearIncrements    48
 5          3 Income                     3
 6          3 AgeInTwoYearIncrements    26
 7          4 Income                     4
 8          4 AgeInTwoYearIncrements    30
 9          5 Income                     4
10          5 AgeInTwoYearIncrements    60
# … with 1,990 more rows

最后,将其通过管道传输到 plot_ly(),所以完整的命令是:

df |>
  # Add a column to keep track of the row numbers
  mutate(row_number = row_number()) |>
  # Squash and lengthen the df with one row per row per column (in this case, double its length)
  pivot_longer(!row_number) |>
  plot_ly() |>
  # The magic is here. We set color to track the name variable, which will
  # add a separate series per column.
  # We set the opacity so we can see where our plots overlap.
  add_histogram(x = ~ value,
                color = ~ name,
                opacity = 0.5) |>
  # Without setting this, bars will be plotted side by side for the same x value
  # rather than overlapping.
  layout(barmode = 'overlay')

输出