有没有一种方法可以 select 并根据行在 R 中的值来分配行的比例?
Is there a way to select and proportion rows based on their value in R?
我有一个如下所示的数据框:
a b c d
1 2005-01-01 0 ... ...
2 2005-02-22 1 ... ...
3 2005-04-02 0 ... ...
4 2005-12-01 3 ... ...
5 2006-03-03 0 ... ...
6 2006-06-08 1 ... ...
7 2006-10-11 0 ... ...
8 2006-12-02 4 ... ...
9 2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16 ... . ... ...
我想 select 2005 年的所有行并查看其中有多少是 0、1、2、3 或 4(例如与列 b 结合)。也许与比例?例如,结果将是:
output:
2005
0 1 2 3 4
20% 20% 20% 20% 20%
我试过 table(year(DF$a), c=DF$b)
但这只会产生所有年份的概览,没有比例或类似的东西。我尝试用 %>%
将其通过管道传递到比例函数中,但这不起作用。
有人知道怎么做吗?
您可以使用 table
和 proportions
来获取每年的份额,您可以在 proportions
中给出 margin
,此处 1
,每行做一次。
proportions(table(format(DF$a, "%Y"), DF$b), 1) * 100
# 0 1 2 3 4
# 2005 50 25 0 25 0
# 2006 50 25 0 0 25
# 2007 50 0 50 0 0
# 2008 75 0 25 0 0
# 2009 100 0 0 0 0
数据:
DF <- structure(list(a = structure(c(12784, 12836, 12875, 13118, 13210,
13307, 13432, 13484, 13596, 13609, 13906, 14110, 14134, 14225,
14391), class = "Date"), b = c(0L, 1L, 0L, 3L, 0L, 1L, 0L, 4L,
0L, 2L, 0L, 0L, 0L, 2L, 0L), c = c("...", "...", "...", "...",
"...", "...", "...", "...", "...", "...", "...", "...", "...",
"...", "..."), d = c("...", "...", "...", "...", "...", "...",
"...", "...", "...", "...", "...", "...", "...", "...", "..."
)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15"), class = "data.frame")
您可以 count
年中每个 b
值的出现,计算比率并使用 pivot_wider
.
获取宽格式数据(如果需要)
library(dplyr)
df %>%
count(year = lubridate::year(a), b) %>%
group_by(year) %>%
mutate(n = n/sum(n) * 100) %>%
arrange(b) %>%
tidyr::pivot_wider(names_from = b, values_from = n, values_fill = 0)
# year `0` `1` `2` `3` `4`
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2005 50 25 0 25 0
#2 2006 50 25 0 0 25
#3 2007 50 0 50 0 0
#4 2008 75 0 25 0 0
#5 2009 100 0 0 0 0
我有一个如下所示的数据框:
a b c d
1 2005-01-01 0 ... ...
2 2005-02-22 1 ... ...
3 2005-04-02 0 ... ...
4 2005-12-01 3 ... ...
5 2006-03-03 0 ... ...
6 2006-06-08 1 ... ...
7 2006-10-11 0 ... ...
8 2006-12-02 4 ... ...
9 2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16 ... . ... ...
我想 select 2005 年的所有行并查看其中有多少是 0、1、2、3 或 4(例如与列 b 结合)。也许与比例?例如,结果将是:
output:
2005
0 1 2 3 4
20% 20% 20% 20% 20%
我试过 table(year(DF$a), c=DF$b)
但这只会产生所有年份的概览,没有比例或类似的东西。我尝试用 %>%
将其通过管道传递到比例函数中,但这不起作用。
有人知道怎么做吗?
您可以使用 table
和 proportions
来获取每年的份额,您可以在 proportions
中给出 margin
,此处 1
,每行做一次。
proportions(table(format(DF$a, "%Y"), DF$b), 1) * 100
# 0 1 2 3 4
# 2005 50 25 0 25 0
# 2006 50 25 0 0 25
# 2007 50 0 50 0 0
# 2008 75 0 25 0 0
# 2009 100 0 0 0 0
数据:
DF <- structure(list(a = structure(c(12784, 12836, 12875, 13118, 13210,
13307, 13432, 13484, 13596, 13609, 13906, 14110, 14134, 14225,
14391), class = "Date"), b = c(0L, 1L, 0L, 3L, 0L, 1L, 0L, 4L,
0L, 2L, 0L, 0L, 0L, 2L, 0L), c = c("...", "...", "...", "...",
"...", "...", "...", "...", "...", "...", "...", "...", "...",
"...", "..."), d = c("...", "...", "...", "...", "...", "...",
"...", "...", "...", "...", "...", "...", "...", "...", "..."
)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15"), class = "data.frame")
您可以 count
年中每个 b
值的出现,计算比率并使用 pivot_wider
.
library(dplyr)
df %>%
count(year = lubridate::year(a), b) %>%
group_by(year) %>%
mutate(n = n/sum(n) * 100) %>%
arrange(b) %>%
tidyr::pivot_wider(names_from = b, values_from = n, values_fill = 0)
# year `0` `1` `2` `3` `4`
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2005 50 25 0 25 0
#2 2006 50 25 0 0 25
#3 2007 50 0 50 0 0
#4 2008 75 0 25 0 0
#5 2009 100 0 0 0 0