使用 dplyr 有一种方法可以保留行但向选定的重复值添加空白
Using dplyr is there a way to keep rows but add blank to selected repeating values
我有一个看起来像这样的数据集
Col1|Col2| Col3|Col4
101 Dog Sep Grooming
101 Dog Sep Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming
我想创建一个 dplyr 脚本,该脚本将第 1 行和第 2 行识别为前三列的重复项,然后将第二行变为空白(第 4 列除外)。我不想删除该行。
Col1|Col2| Col3|Col4
101 Dog Sep Grooming
Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming
不是最干净的答案
library(tidyverse)
example_data <- read_table(r"(
Col1 Col2 Col3 Col4
101 Dog Sep Grooming
101 Dog Sep Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming)")
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 1 -- 4 columns 5 columns literal data
example_data %>%
group_by(sequence = str_c(Col1,Col2,Col3)) %>%
mutate(across(c(Col1,Col2,Col3),.fns = ~ replace(.x,duplicated(.x),""))) %>%
ungroup() %>%
select(-sequence)
#> # A tibble: 4 x 4
#> Col1 Col2 Col3 Col4
#> <chr> <chr> <chr> <chr>
#> 1 "101" "Dog" "Sep" Grooming
#> 2 "" "" "" Birthday
#> 3 "303" "Cat" "Oct" Birthday
#> 4 "404" "Dog" "Sep" Grooming
由 reprex package (v2.0.1)
于 2021-08-17 创建
我们可以使用base R
dat1[duplicated(dat1[1:3]), 1:3] <- ""
-输出
> dat1
Col1 Col2 Col3 Col4
1 101 Dog Sep Grooming
2 Birthday
3 303 Cat Oct Birthday
4 404 Dog Sep Grooming
数据
dat1 <- structure(list(Col1 = c(101L, 101L, 303L, 404L), Col2 = c("Dog",
"Dog", "Cat", "Dog"), Col3 = c("Sep", "Sep", "Oct", "Sep"), Col4 = c("Grooming",
"Birthday", "Birthday", "Grooming")), class = "data.frame", row.names = c(NA,
-4L))
我有一个看起来像这样的数据集
Col1|Col2| Col3|Col4
101 Dog Sep Grooming
101 Dog Sep Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming
我想创建一个 dplyr 脚本,该脚本将第 1 行和第 2 行识别为前三列的重复项,然后将第二行变为空白(第 4 列除外)。我不想删除该行。
Col1|Col2| Col3|Col4
101 Dog Sep Grooming
Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming
不是最干净的答案
library(tidyverse)
example_data <- read_table(r"(
Col1 Col2 Col3 Col4
101 Dog Sep Grooming
101 Dog Sep Birthday
303 Cat Oct Birthday
404 Dog Sep Grooming)")
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 1 -- 4 columns 5 columns literal data
example_data %>%
group_by(sequence = str_c(Col1,Col2,Col3)) %>%
mutate(across(c(Col1,Col2,Col3),.fns = ~ replace(.x,duplicated(.x),""))) %>%
ungroup() %>%
select(-sequence)
#> # A tibble: 4 x 4
#> Col1 Col2 Col3 Col4
#> <chr> <chr> <chr> <chr>
#> 1 "101" "Dog" "Sep" Grooming
#> 2 "" "" "" Birthday
#> 3 "303" "Cat" "Oct" Birthday
#> 4 "404" "Dog" "Sep" Grooming
由 reprex package (v2.0.1)
于 2021-08-17 创建我们可以使用base R
dat1[duplicated(dat1[1:3]), 1:3] <- ""
-输出
> dat1
Col1 Col2 Col3 Col4
1 101 Dog Sep Grooming
2 Birthday
3 303 Cat Oct Birthday
4 404 Dog Sep Grooming
数据
dat1 <- structure(list(Col1 = c(101L, 101L, 303L, 404L), Col2 = c("Dog",
"Dog", "Cat", "Dog"), Col3 = c("Sep", "Sep", "Oct", "Sep"), Col4 = c("Grooming",
"Birthday", "Birthday", "Grooming")), class = "data.frame", row.names = c(NA,
-4L))