保留所有不同的列,而不是一个 r
keep all distinct columns, instead of one r
我正在尝试查找已发送给 4 个或更多独立接收者(按名称)的所有发件人,其中发送给这些独立接收者的总金额超过 $5000.00 并寻找一种方法告诉 r 维护所有值包含不同的名称,而不仅仅是一个。
例如,使用以下 data.frame:
sender<-c("tom","tom","kevin","frank","tom","chris","tom","tom","craig","louis",
"john", "tom","brian","tom","George")
reciever<-c("ryan","dave","sarah","kel","eric","ben","wayne","mike","brenda","christina",
"brianna","hal","sam","ryan","van")
amount<-as.numeric(c("200","100","300","3000","100","350","100","90","670","865","600",
"300","1300","5200","200"))
dF<-data.frame(sender,reciever,amount)
使用 dpylr 应用以下参数:
dF1<-dF%>%
distinct(reciever,.keep_all = TRUE)%>%
group_by(sender)%>%
summarise(
count=n(),
total = sum(amount)
)%>%
filter(count >= 4 & total>5000)
您会注意到示例发件人向量中的目标是 tom。 tom 与 Ryan 有 2 笔交易,但是,由于 distinct 函数的性质,r 拉取第一列与 ryan 的对应金额为 200,并排除另一列与 ryan 的金额,即 5200。这种排除存在问题,因为被排除的交易(如果包括在内)将符合过滤器中应用的 5000 美元阈值的逻辑。
有没有一种方法可以使用 distinct 函数来告诉 r 保留所有出现的涉及相似不同名称的事件?或者,我应该从一个完全不同的角度来处理这个问题吗?
谢谢!
我们可以使用
library(dplyr)
dF %>%
group_by(sender) %>%
filter(n_distinct(reciever) >=4, sum(amount) >=5000) %>%
ungroup
-输出
# A tibble: 7 x 3
sender reciever amount
<chr> <chr> <dbl>
1 tom ryan 200
2 tom dave 100
3 tom eric 100
4 tom wayne 100
5 tom mike 90
6 tom hal 300
7 tom ryan 5200
如果我们只需要满足条件的那些对
dF %>%
group_by(sender) %>%
filter(n_distinct(reciever) >=4, sum(amount) >=5000) %>%
group_by(sender, reciever) %>% filter(sum(amount) >= 5000)
# A tibble: 2 x 3
# Groups: sender, reciever [1]
# sender reciever amount
# <chr> <chr> <dbl>
#1 tom ryan 200
#2 tom ryan 5200
我正在尝试查找已发送给 4 个或更多独立接收者(按名称)的所有发件人,其中发送给这些独立接收者的总金额超过 $5000.00 并寻找一种方法告诉 r 维护所有值包含不同的名称,而不仅仅是一个。
例如,使用以下 data.frame:
sender<-c("tom","tom","kevin","frank","tom","chris","tom","tom","craig","louis",
"john", "tom","brian","tom","George")
reciever<-c("ryan","dave","sarah","kel","eric","ben","wayne","mike","brenda","christina",
"brianna","hal","sam","ryan","van")
amount<-as.numeric(c("200","100","300","3000","100","350","100","90","670","865","600",
"300","1300","5200","200"))
dF<-data.frame(sender,reciever,amount)
使用 dpylr 应用以下参数:
dF1<-dF%>%
distinct(reciever,.keep_all = TRUE)%>%
group_by(sender)%>%
summarise(
count=n(),
total = sum(amount)
)%>%
filter(count >= 4 & total>5000)
您会注意到示例发件人向量中的目标是 tom。 tom 与 Ryan 有 2 笔交易,但是,由于 distinct 函数的性质,r 拉取第一列与 ryan 的对应金额为 200,并排除另一列与 ryan 的金额,即 5200。这种排除存在问题,因为被排除的交易(如果包括在内)将符合过滤器中应用的 5000 美元阈值的逻辑。
有没有一种方法可以使用 distinct 函数来告诉 r 保留所有出现的涉及相似不同名称的事件?或者,我应该从一个完全不同的角度来处理这个问题吗?
谢谢!
我们可以使用
library(dplyr)
dF %>%
group_by(sender) %>%
filter(n_distinct(reciever) >=4, sum(amount) >=5000) %>%
ungroup
-输出
# A tibble: 7 x 3
sender reciever amount
<chr> <chr> <dbl>
1 tom ryan 200
2 tom dave 100
3 tom eric 100
4 tom wayne 100
5 tom mike 90
6 tom hal 300
7 tom ryan 5200
如果我们只需要满足条件的那些对
dF %>%
group_by(sender) %>%
filter(n_distinct(reciever) >=4, sum(amount) >=5000) %>%
group_by(sender, reciever) %>% filter(sum(amount) >= 5000)
# A tibble: 2 x 3
# Groups: sender, reciever [1]
# sender reciever amount
# <chr> <chr> <dbl>
#1 tom ryan 200
#2 tom ryan 5200