根据条件分区data.frame
Partitioning data.frame according to condition
我有一个 data.frame 形状像:
c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq"))
> c
name value address
1 a 1 rrrr
2 a 3 rrrr
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
7 d 4 qqqq
8 d 5 qqqq
我试图根据一个简单的规则将这个数据框分成两个单独的数据框:将没有更改地址的人分组在一起,将更改地址的人分组在一起。关于如何完成任务的任何提示?
到目前为止,我正在玩,没有用,有:
for(i in seq(1,8, by=2)){
print(i)
print(unlist(c[which(c[i,3]==c[(i+1),3]),]))
}
使用 dplyr
:
library(dplyr)
z<-c %>% group_by(name) %>%
mutate(changed = n_distinct(address))
split(z, z$changed)
感谢@akrun 提醒我n_distinct
@jeremycg 的回答很好,我正在尝试学习 dplyr,但这里也有非 dplyr 版本。
numAddresses <- sapply(split(c, c$name), function(x)
length(unique(x$address)))
split(c, numAddresses[c$address])
这会计算地址的数量并在此基础上拆分。有一个障碍需要克服,它与总是从 ave
得到 <NA>
直到使用 as.character
有关。有一条警告消息,我正在从中复制开头,因此搜索者可能会找到这个:
Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = c(1L, 1L)) :
成功版本(使用名为cc
的数据对象):
split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) )
$`1`
name value address
1 a 1 rrrr
2 a 3 rrrr
7 d 4 qqqq
8 d 5 qqqq
$`2`
name value address
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
如果你真的想要二分法,那么用 > 1
:
转换成逻辑法
split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1)
$`FALSE`
name value address
1 a 1 rrrr
2 a 3 rrrr
7 d 4 qqqq
8 d 5 qqqq
$`TRUE`
name value address
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
我不明白评论。这就是我得到的 str(dat)
:
List of 2
$ FALSE:'data.frame': 4 obs. of 3 variables:
..$ name : Factor w/ 4 levels "a","b","c","d": 1 1 4 4
..$ value : num [1:4] 1 3 4 5
..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 4 4 3 3
$ TRUE :'data.frame': 4 obs. of 3 variables:
..$ name : Factor w/ 4 levels "a","b","c","d": 2 2 3 3
..$ value : num [1:4] 2 4 5 3
..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 6 1 5 2
我有一个 data.frame 形状像:
c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq"))
> c
name value address
1 a 1 rrrr
2 a 3 rrrr
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
7 d 4 qqqq
8 d 5 qqqq
我试图根据一个简单的规则将这个数据框分成两个单独的数据框:将没有更改地址的人分组在一起,将更改地址的人分组在一起。关于如何完成任务的任何提示?
到目前为止,我正在玩,没有用,有:
for(i in seq(1,8, by=2)){
print(i)
print(unlist(c[which(c[i,3]==c[(i+1),3]),]))
}
使用 dplyr
:
library(dplyr)
z<-c %>% group_by(name) %>%
mutate(changed = n_distinct(address))
split(z, z$changed)
感谢@akrun 提醒我n_distinct
@jeremycg 的回答很好,我正在尝试学习 dplyr,但这里也有非 dplyr 版本。
numAddresses <- sapply(split(c, c$name), function(x)
length(unique(x$address)))
split(c, numAddresses[c$address])
这会计算地址的数量并在此基础上拆分。有一个障碍需要克服,它与总是从 ave
得到 <NA>
直到使用 as.character
有关。有一条警告消息,我正在从中复制开头,因此搜索者可能会找到这个:
Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = c(1L, 1L)) :
成功版本(使用名为cc
的数据对象):
split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) )
$`1`
name value address
1 a 1 rrrr
2 a 3 rrrr
7 d 4 qqqq
8 d 5 qqqq
$`2`
name value address
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
如果你真的想要二分法,那么用 > 1
:
split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1)
$`FALSE`
name value address
1 a 1 rrrr
2 a 3 rrrr
7 d 4 qqqq
8 d 5 qqqq
$`TRUE`
name value address
3 b 2 zzzz
4 b 4 aaaa
5 c 5 ssss
6 c 3 jjjj
我不明白评论。这就是我得到的 str(dat)
:
List of 2
$ FALSE:'data.frame': 4 obs. of 3 variables:
..$ name : Factor w/ 4 levels "a","b","c","d": 1 1 4 4
..$ value : num [1:4] 1 3 4 5
..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 4 4 3 3
$ TRUE :'data.frame': 4 obs. of 3 variables:
..$ name : Factor w/ 4 levels "a","b","c","d": 2 2 3 3
..$ value : num [1:4] 2 4 5 3
..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 6 1 5 2