根据前一列中的值按比例填充新变量?
Proportionally fill new variable based on values in previous column?
我想使用数据框中其他地方的信息创建一个新变量。这看起来很简单,但我想按比例分配新变量的水平。
我有一个数据框:
dd<-read.table(text="
group piece answer
group1 A noise
group1 A silence
group1 A silence
group1 B silence
group1 B loud_noise
group1 B noise
group1 B loud_noise
group1 B noise
group2 C silence
group2 C silence", header=TRUE)
我想创建一个具有两个级别的新变量 'majority_annotation':好的和坏的。好意味着每件作品都有多数人同意 (>55%)。 Bad 表示该作品没有获得多数人的同意。
group piece answer majority_agreement
group1 A noise good
group1 A silence good
group1 A silence good
group1 B silence bad
group1 B loud_noise bad
group1 B noise bad
group1 B loud_noise bad
group1 B noise bad
group2 C silence good
group2 C silence good
我可以二进制执行此操作(全部或不同意):
newdf <- df %>%
group_by(group) %>%
mutate(majority_agreement = ifelse(length(unique(answer)) <= 1,
'good',
ifelse(length(unique(answer) > 1) &
(length(unique(answer)) >= 2), 'bad', 'bad'))) %>%
as.data.frame
我怎样才能按比例进行呢?
library(dplyr)
newdf <- df %>%
count(group, piece, answer) %>% # How many of each answer for each group & piece
group_by(group, piece) %>%
mutate(share = n / sum(n)) %>% # What share have this answer?
summarize(max_share = max(share)) %>% # What's the largest share among them?
mutate(majority_agreement = if_else(max_share > 0.55, "good", "bad")) %>%
ungroup() %>%
right_join(df) # Add the conclusion back to the original data
这似乎可以使用 dplyr
来完成您想要的操作
library(dplyr)
dd %>%
group_by(piece) %>%
mutate(majority_agreement = if_else(max(table(answer)/n())>.55, "good", "bad"))
在每个 "piece" 中,我们使用 table()
来计算不同响应的数量并将其除以 n()
以获得每个响应的比例。我们查看最大比例是否大于 0.55。如果是,我们给出标签"good",否则我们给出标签"bad"
我想使用数据框中其他地方的信息创建一个新变量。这看起来很简单,但我想按比例分配新变量的水平。
我有一个数据框:
dd<-read.table(text="
group piece answer
group1 A noise
group1 A silence
group1 A silence
group1 B silence
group1 B loud_noise
group1 B noise
group1 B loud_noise
group1 B noise
group2 C silence
group2 C silence", header=TRUE)
我想创建一个具有两个级别的新变量 'majority_annotation':好的和坏的。好意味着每件作品都有多数人同意 (>55%)。 Bad 表示该作品没有获得多数人的同意。
group piece answer majority_agreement
group1 A noise good
group1 A silence good
group1 A silence good
group1 B silence bad
group1 B loud_noise bad
group1 B noise bad
group1 B loud_noise bad
group1 B noise bad
group2 C silence good
group2 C silence good
我可以二进制执行此操作(全部或不同意):
newdf <- df %>%
group_by(group) %>%
mutate(majority_agreement = ifelse(length(unique(answer)) <= 1,
'good',
ifelse(length(unique(answer) > 1) &
(length(unique(answer)) >= 2), 'bad', 'bad'))) %>%
as.data.frame
我怎样才能按比例进行呢?
library(dplyr)
newdf <- df %>%
count(group, piece, answer) %>% # How many of each answer for each group & piece
group_by(group, piece) %>%
mutate(share = n / sum(n)) %>% # What share have this answer?
summarize(max_share = max(share)) %>% # What's the largest share among them?
mutate(majority_agreement = if_else(max_share > 0.55, "good", "bad")) %>%
ungroup() %>%
right_join(df) # Add the conclusion back to the original data
这似乎可以使用 dplyr
library(dplyr)
dd %>%
group_by(piece) %>%
mutate(majority_agreement = if_else(max(table(answer)/n())>.55, "good", "bad"))
在每个 "piece" 中,我们使用 table()
来计算不同响应的数量并将其除以 n()
以获得每个响应的比例。我们查看最大比例是否大于 0.55。如果是,我们给出标签"good",否则我们给出标签"bad"