根据 R 中另一列中的顺序创建列
Create column based on ordering in another column in R
我有一个数据框,它的版本更长:
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df.desired <- as.data.frame(c(council_name, as.yearqtr(quarter), treat, df, first.treatment))
我想要的是“council_name”的每个值在“治疗”第一次为 1 时具有“季度”值的列。如果特定 council_name.
的“治疗”从不为 1,则为“0”
这会像这样:
library(zoo)
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
first.treatment <- c("2006 Q1", "2006 Q3", 0)
df.desired <- as.data.frame <- c(council_name, as.yearqtr(quarter), treat, df, first.treatment)
我用 group_by 和排序尝试了不同的东西,但我从来没有完全明白我要找的东西。
我尝试过的一个例子是:
merged2%>%
group_by(council_name, year_qtr)%>%
arrange(year_qtr)%>%
mutate(first.treatment = by(year_qtr, head, 1))
但得到了:
Error: Problem with `mutate()` input `first.treatment`. x unique() applies only to vectors ℹ Input `first.treatment` is `by(year_qtr, head, 1)`. ℹ The error occured in group 1: council_name = "Adur", year_qtr = 2006 Q2.
非常感谢!
我确实对示例数据做了一些调整,但我非常希望,这就是你的意思。
我不喜欢 return 字符串或 0
的想法。应该始终 return 相同的数据类型。这就是为什么我的回答 return 要么 quarter
要么 NA
。如果您坚持 returning 0
可以使用 is.na
.
轻松“修复”
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df <- data.frame(council_name, quarter, treat)
treat.one <- function(d){
line <- which(d$treat == 1)[1]
return(d$quarter[line])
}
by(df, council_name, treat.one)
这需要
council_name quarter treat
1 Southwark 2006 Q1 1
2 Southwark 2006 Q2 0
3 Southwark 2006 Q3 1
4 Lambeth 2006 Q1 0
5 Lambeth 2006 Q2 0
6 Lambeth 2006 Q3 1
7 Yorkshire 2006 Q1 0
8 Yorkshire 2006 Q2 0
9 Yorkshire 2006 Q3 0
和returns
> by(df, council_name, treat.one)
council_name: Lambeth
[1] "2006 Q3"
-----------------------------------------
council_name: Southwark
[1] "2006 Q1"
-----------------------------------------
council_name: Yorkshire
[1] NA
当使用 group_by
时,mutate
调用将依次考虑所有组中的每个变量。
因此,你可以这样写:
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
mutate(first_treatment = year_qtr[treat==1][1]) %>%
arrange(council_name, year_qtr)
或
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
summarise(first_treatment = year_qtr[treat==1][1])
对于每个组,这会要求 year_qtr
列,其中 treat==1
,并获取结果向量的第一个值。这就是为什么事先排序很重要 (arrange
)。
我有一个数据框,它的版本更长:
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df.desired <- as.data.frame(c(council_name, as.yearqtr(quarter), treat, df, first.treatment))
我想要的是“council_name”的每个值在“治疗”第一次为 1 时具有“季度”值的列。如果特定 council_name.
的“治疗”从不为 1,则为“0”这会像这样:
library(zoo)
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
first.treatment <- c("2006 Q1", "2006 Q3", 0)
df.desired <- as.data.frame <- c(council_name, as.yearqtr(quarter), treat, df, first.treatment)
我用 group_by 和排序尝试了不同的东西,但我从来没有完全明白我要找的东西。
我尝试过的一个例子是:
merged2%>%
group_by(council_name, year_qtr)%>%
arrange(year_qtr)%>%
mutate(first.treatment = by(year_qtr, head, 1))
但得到了:
Error: Problem with `mutate()` input `first.treatment`. x unique() applies only to vectors ℹ Input `first.treatment` is `by(year_qtr, head, 1)`. ℹ The error occured in group 1: council_name = "Adur", year_qtr = 2006 Q2.
非常感谢!
我确实对示例数据做了一些调整,但我非常希望,这就是你的意思。
我不喜欢 return 字符串或 0
的想法。应该始终 return 相同的数据类型。这就是为什么我的回答 return 要么 quarter
要么 NA
。如果您坚持 returning 0
可以使用 is.na
.
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df <- data.frame(council_name, quarter, treat)
treat.one <- function(d){
line <- which(d$treat == 1)[1]
return(d$quarter[line])
}
by(df, council_name, treat.one)
这需要
council_name quarter treat
1 Southwark 2006 Q1 1
2 Southwark 2006 Q2 0
3 Southwark 2006 Q3 1
4 Lambeth 2006 Q1 0
5 Lambeth 2006 Q2 0
6 Lambeth 2006 Q3 1
7 Yorkshire 2006 Q1 0
8 Yorkshire 2006 Q2 0
9 Yorkshire 2006 Q3 0
和returns
> by(df, council_name, treat.one)
council_name: Lambeth
[1] "2006 Q3"
-----------------------------------------
council_name: Southwark
[1] "2006 Q1"
-----------------------------------------
council_name: Yorkshire
[1] NA
当使用 group_by
时,mutate
调用将依次考虑所有组中的每个变量。
因此,你可以这样写:
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
mutate(first_treatment = year_qtr[treat==1][1]) %>%
arrange(council_name, year_qtr)
或
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
summarise(first_treatment = year_qtr[treat==1][1])
对于每个组,这会要求 year_qtr
列,其中 treat==1
,并获取结果向量的第一个值。这就是为什么事先排序很重要 (arrange
)。