使用 dplyr 有一种方法可以保留行但向选定的重复值添加空白

Question

我有一个看起来像这样的数据集

Col1|Col2| Col3|Col4
 101 Dog   Sep  Grooming 
 101 Dog   Sep  Birthday
 303 Cat   Oct  Birthday
 404 Dog   Sep  Grooming

我想创建一个 dplyr 脚本，该脚本将第 1 行和第 2 行识别为前三列的重复项，然后将第二行变为空白（第 4 列除外）。我不想删除该行。

    Col1|Col2| Col3|Col4
 101 Dog   Sep  Grooming 
                Birthday
 303 Cat   Oct  Birthday
 404 Dog   Sep  Grooming

Answer 1

不是最干净的答案

library(tidyverse)

example_data <- read_table(r"(
                           Col1 Col2 Col3 Col4
                           101 Dog Sep Grooming 
                           101 Dog Sep Birthday
                           303 Cat Oct Birthday
                           404 Dog Sep Grooming)")
#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   1  -- 4 columns 5 columns literal data


example_data %>%
  group_by(sequence = str_c(Col1,Col2,Col3)) %>% 
  mutate(across(c(Col1,Col2,Col3),.fns = ~ replace(.x,duplicated(.x),""))) %>%
  ungroup() %>% 
  select(-sequence)
#> # A tibble: 4 x 4
#>   Col1  Col2  Col3  Col4    
#>   <chr> <chr> <chr> <chr>   
#> 1 "101" "Dog" "Sep" Grooming
#> 2 ""    ""    ""    Birthday
#> 3 "303" "Cat" "Oct" Birthday
#> 4 "404" "Dog" "Sep" Grooming

^{由 reprex package (v2.0.1)}

于 2021-08-17 创建

Answer 2

我们可以使用base R

dat1[duplicated(dat1[1:3]), 1:3] <- ""

-输出

> dat1
  Col1 Col2 Col3     Col4
1  101  Dog  Sep Grooming
2                Birthday
3  303  Cat  Oct Birthday
4  404  Dog  Sep Grooming

数据

dat1 <- structure(list(Col1 = c(101L, 101L, 303L, 404L), Col2 = c("Dog", 
"Dog", "Cat", "Dog"), Col3 = c("Sep", "Sep", "Oct", "Sep"), Col4 = c("Grooming", 
"Birthday", "Birthday", "Grooming")), class = "data.frame", row.names = c(NA, 
-4L))

使用 dplyr 有一种方法可以保留行但向选定的重复值添加空白

Using dplyr is there a way to keep rows but add blank to selected repeating values

r

duplicates

dplyr

数据