根据先前的历史创建新变量
Creating a new variable based on prior history
我有数据需要根据之前的历史创建变量,例如
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
我想创建一个名为 'existing' 的变量,它考虑到前几年并查看 he/she 是否住在那个地方,如果这个地方是旧的(并且他们住在那里。有什么建议吗?
library(dplyr)
df %>% group_by(person) %>% mutate (existing=0)
existing<- c(1, 1, 0, 1, 0,1)
根据OP更新的信息,我们需要先arrange
person
和年份(created
)的数据,然后使用duplicated
.
library(dplyr)
df %>%
arrange(person, created) %>%
group_by(person) %>%
mutate(existing = +(!duplicated(location)))
# created person location existing
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 1
#3 2010 A London 0
#4 2011 A New York 1
#5 2011 B London 1
#6 2012 B London 0
你可以试试,
with(df, ave(location, person, FUN = function(i)as.integer(!duplicated(i))))
#[1] "1" "1" "0" "1" "1" "0"
另一个 dplyr
选项可以是:
df %>%
group_by(person, location) %>%
mutate(existing = +(1:n() == 1))
created person location existing
<dbl> <fct> <fct> <int>
1 2009 A London 1
2 2010 A Geneva 1
3 2010 A London 0
4 2011 A New York 1
5 2012 B London 1
6 2011 B London 0
如果需要排序:
df %>%
group_by(person, location) %>%
arrange(created, .by_group = TRUE) %>%
mutate(existing = +(1:n() == 1))
使用 data.table
的另一个选项:
setDT(df)[order(person, created), existing := c(1L, rep(0L, .N-1L)), .(person, location)]
输出:
created person location existing
1: 2009 A London 1
2: 2010 A Geneva 1
3: 2010 A London 0
4: 2011 A New York 1
5: 2012 B London 0
6: 2011 B London 1
我有数据需要根据之前的历史创建变量,例如
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
我想创建一个名为 'existing' 的变量,它考虑到前几年并查看 he/she 是否住在那个地方,如果这个地方是旧的(并且他们住在那里。有什么建议吗?
library(dplyr)
df %>% group_by(person) %>% mutate (existing=0)
existing<- c(1, 1, 0, 1, 0,1)
根据OP更新的信息,我们需要先arrange
person
和年份(created
)的数据,然后使用duplicated
.
library(dplyr)
df %>%
arrange(person, created) %>%
group_by(person) %>%
mutate(existing = +(!duplicated(location)))
# created person location existing
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 1
#3 2010 A London 0
#4 2011 A New York 1
#5 2011 B London 1
#6 2012 B London 0
你可以试试,
with(df, ave(location, person, FUN = function(i)as.integer(!duplicated(i))))
#[1] "1" "1" "0" "1" "1" "0"
另一个 dplyr
选项可以是:
df %>%
group_by(person, location) %>%
mutate(existing = +(1:n() == 1))
created person location existing
<dbl> <fct> <fct> <int>
1 2009 A London 1
2 2010 A Geneva 1
3 2010 A London 0
4 2011 A New York 1
5 2012 B London 1
6 2011 B London 0
如果需要排序:
df %>%
group_by(person, location) %>%
arrange(created, .by_group = TRUE) %>%
mutate(existing = +(1:n() == 1))
使用 data.table
的另一个选项:
setDT(df)[order(person, created), existing := c(1L, rep(0L, .N-1L)), .(person, location)]
输出:
created person location existing
1: 2009 A London 1
2: 2010 A Geneva 1
3: 2010 A London 0
4: 2011 A New York 1
5: 2012 B London 0
6: 2011 B London 1