根据组过滤多次出现
Filter multiple occurrences based on group
我有一个如下所述的数据集:
df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
df$Supplier_id=as.numeric(as.character(df$Supplier_id))
因此,df 如下所示:
| Supplier_id | Supplier | Date | Buyer |
|-------------|----------|------------|--------------|
| 1 | Tian | 1/17/2019 | Unclassified |
| 2 | Yan | 4/30/2019 | Unclassified |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 4 | Amy | 5/21/2018 | Kelly |
| 5 | Lauren | 5/23/2018 | Amanda |
| 8 | Cassy | 5/24/2018 | Echo |
| 12 | Shaan | 6/15/2018 | Shao |
| 7 | Goldy | 6/20/2018 | Shao |
现在,我想过滤掉每个唯一买家只出现一次的 Supplier_id。例如,在上面的数据集中,Supplier_id '1' 和 '2' 属于 'unclassified' 买家,但因为它们具有不同的 ID,所以我不希望它们出现在我的最终输出中。但是,当我们查看买方 'Kelly' 时,它有两个 supplier_ids、'7' 和 '4',其中,'7' 出现了 3 次,而 '4' 只出现了一次。所以,输出 table 应该有 supplier_id='7' 的记录。分组应基于'Buyer'。因此需要注意的是,由于 'Kelly' 和 'Shao' 都存在 supplier_id '7',但应该对这两个买家进行不同的分组,而不是一起考虑。
预期的输出应该是:
| Supplier_id | Supplier | Date | Buyer_id |
|-------------|:--------:|-----------:|----------|
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
我试过使用 group_by 和过滤器,但这不起作用,因为每个 supplier_id 都会有不同的 buyer.I 也尝试过使用重复但不确定如何才能我为每个买家分组 supplier_id。
df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
还有这个
df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
编辑:原始数据集有很多这样的实例,每个买家出现 n 次不同的 supplier_id。
还有什么其他方法可以获得所需的输出?
我觉得你需要-
df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)
我有一个如下所述的数据集:
df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
df$Supplier_id=as.numeric(as.character(df$Supplier_id))
因此,df 如下所示:
| Supplier_id | Supplier | Date | Buyer |
|-------------|----------|------------|--------------|
| 1 | Tian | 1/17/2019 | Unclassified |
| 2 | Yan | 4/30/2019 | Unclassified |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 4 | Amy | 5/21/2018 | Kelly |
| 5 | Lauren | 5/23/2018 | Amanda |
| 8 | Cassy | 5/24/2018 | Echo |
| 12 | Shaan | 6/15/2018 | Shao |
| 7 | Goldy | 6/20/2018 | Shao |
现在,我想过滤掉每个唯一买家只出现一次的 Supplier_id。例如,在上面的数据集中,Supplier_id '1' 和 '2' 属于 'unclassified' 买家,但因为它们具有不同的 ID,所以我不希望它们出现在我的最终输出中。但是,当我们查看买方 'Kelly' 时,它有两个 supplier_ids、'7' 和 '4',其中,'7' 出现了 3 次,而 '4' 只出现了一次。所以,输出 table 应该有 supplier_id='7' 的记录。分组应基于'Buyer'。因此需要注意的是,由于 'Kelly' 和 'Shao' 都存在 supplier_id '7',但应该对这两个买家进行不同的分组,而不是一起考虑。
预期的输出应该是:
| Supplier_id | Supplier | Date | Buyer_id |
|-------------|:--------:|-----------:|----------|
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
我试过使用 group_by 和过滤器,但这不起作用,因为每个 supplier_id 都会有不同的 buyer.I 也尝试过使用重复但不确定如何才能我为每个买家分组 supplier_id。
df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
还有这个
df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
编辑:原始数据集有很多这样的实例,每个买家出现 n 次不同的 supplier_id。 还有什么其他方法可以获得所需的输出?
我觉得你需要-
df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)