如何在R(数据框)中创建每年每个名称出现值的频率变量
how to create a frequency variable of occurence of a value per name per year in R (dataframe)
我正在努力在我的数据框中创建一个新的 variable/colum,其中包含每年每个名字 wi = 1 的频率。
这是一个测试数据框
df = tibble::as_tibble(data.frame(Name=c("x","x","x","x","x", "y","y","y","y","y","y"), Year=c(2011,2011,2011,2012,2012,2011,2011,2012,2012,2012,2012), id=c(8,23, 1,5,7,25,83,6,2,9,10), wi =c(1,0,1,1,0,1,1,0,0,1,0)))
# A tibble: 11 × 4
Name Year id wi
<chr> <dbl> <dbl> <dbl>
1 x 2011 8 1
2 x 2011 23 0
3 x 2011 1 1
4 x 2012 5 1
5 x 2012 7 0
6 y 2011 25 1
7 y 2011 83 1
8 y 2012 6 0
9 y 2012 2 0
10 y 2012 9 1
11 y 2012 10 0
理想情况下,数据框最终看起来像这样:
df
# A tibble: 11 × 5
Name Year id wi freq_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.66
2 x 2011 23 0 0.66
3 x 2011 1 1 0.66
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
感谢所有帮助!!
如果 wi 始终为 0 或 1,您可以执行以下操作(因为 mean(wi) 等于“wi 的频率”)
library(dplyr)
df %>%
group_by(Name, Year) %>%
summarise(freq_wi=mean(wi)) %>%
left_join(df, .)
Name Year id wi freq_wi
<fct> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
这是另一个dplyr
解决方案:
library(dplyr)
df %>%
group_by(Name, Year) %>%
mutate(Count = ifelse(wi == 1, sum(wi), sum(wi)),
req_wi = Count/sum(Count)*Count) %>%
ungroup() %>%
select(-Count)
Name Year id wi req_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
我正在努力在我的数据框中创建一个新的 variable/colum,其中包含每年每个名字 wi = 1 的频率。
这是一个测试数据框
df = tibble::as_tibble(data.frame(Name=c("x","x","x","x","x", "y","y","y","y","y","y"), Year=c(2011,2011,2011,2012,2012,2011,2011,2012,2012,2012,2012), id=c(8,23, 1,5,7,25,83,6,2,9,10), wi =c(1,0,1,1,0,1,1,0,0,1,0)))
# A tibble: 11 × 4
Name Year id wi
<chr> <dbl> <dbl> <dbl>
1 x 2011 8 1
2 x 2011 23 0
3 x 2011 1 1
4 x 2012 5 1
5 x 2012 7 0
6 y 2011 25 1
7 y 2011 83 1
8 y 2012 6 0
9 y 2012 2 0
10 y 2012 9 1
11 y 2012 10 0
理想情况下,数据框最终看起来像这样:
df
# A tibble: 11 × 5
Name Year id wi freq_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.66
2 x 2011 23 0 0.66
3 x 2011 1 1 0.66
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
感谢所有帮助!!
如果 wi 始终为 0 或 1,您可以执行以下操作(因为 mean(wi) 等于“wi 的频率”)
library(dplyr)
df %>%
group_by(Name, Year) %>%
summarise(freq_wi=mean(wi)) %>%
left_join(df, .)
Name Year id wi freq_wi
<fct> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
这是另一个dplyr
解决方案:
library(dplyr)
df %>%
group_by(Name, Year) %>%
mutate(Count = ifelse(wi == 1, sum(wi), sum(wi)),
req_wi = Count/sum(Count)*Count) %>%
ungroup() %>%
select(-Count)
Name Year id wi req_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25