有没有一种方法可以 select 并根据行在 R 中的值来分配行的比例？

Question

我有一个如下所示的数据框：

   a          b  c   d
1  2005-01-01 0 ... ...
2  2005-02-22 1 ... ...
3  2005-04-02 0 ... ...
4  2005-12-01 3 ... ...
5  2006-03-03 0 ... ...
6  2006-06-08 1 ... ...
7  2006-10-11 0 ... ...
8  2006-12-02 4 ... ...
9  2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16    ...     . ... ...

我想 select 2005 年的所有行并查看其中有多少是 0、1、2、3 或 4（例如与列 b 结合）。也许与比例？例如，结果将是：

output:
2005
0    1    2    3    4
20%  20%  20%  20%  20%

我试过 table(year(DF$a), c=DF$b) 但这只会产生所有年份的概览，没有比例或类似的东西。我尝试用 %>% 将其通过管道传递到比例函数中，但这不起作用。

有人知道怎么做吗？

Answer 1

您可以使用 table 和 proportions 来获取每年的份额，您可以在 proportions 中给出 margin，此处 1，每行做一次。

proportions(table(format(DF$a, "%Y"), DF$b), 1) * 100
#         0   1   2   3   4
#  2005  50  25   0  25   0
#  2006  50  25   0   0  25
#  2007  50   0  50   0   0
#  2008  75   0  25   0   0
#  2009 100   0   0   0   0

数据：

DF <- structure(list(a = structure(c(12784, 12836, 12875, 13118, 13210, 
13307, 13432, 13484, 13596, 13609, 13906, 14110, 14134, 14225, 
14391), class = "Date"), b = c(0L, 1L, 0L, 3L, 0L, 1L, 0L, 4L, 
0L, 2L, 0L, 0L, 0L, 2L, 0L), c = c("...", "...", "...", "...", 
"...", "...", "...", "...", "...", "...", "...", "...", "...", 
"...", "..."), d = c("...", "...", "...", "...", "...", "...", 
"...", "...", "...", "...", "...", "...", "...", "...", "..."
)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15"), class = "data.frame")

Answer 2

您可以 count 年中每个 b 值的出现，计算比率并使用 pivot_wider.

获取宽格式数据（如果需要）

library(dplyr)
df %>%
  count(year = lubridate::year(a), b) %>%
  group_by(year) %>%
  mutate(n = n/sum(n) * 100) %>%
  arrange(b) %>%
  tidyr::pivot_wider(names_from = b, values_from = n, values_fill = 0)

#   year   `0`   `1`   `2`   `3`   `4`
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  2005    50    25     0    25     0
#2  2006    50    25     0     0    25
#3  2007    50     0    50     0     0
#4  2008    75     0    25     0     0
#5  2009   100     0     0     0     0

有没有一种方法可以 select 并根据行在 R 中的值来分配行的比例？

Is there a way to select and proportion rows based on their value in R?

r

rows

categories

dataframe