如何删除每列中的两个重复项之一并将它们合并到 r
how to delete one of two duplicates in each column and merge them in r
我有一个包含两列的数据,这些列上有一些重复项。我想删除每列的重复项,然后收集所有保留列名的唯一值。
data<-structure(c(10L, 10L, 11L, 11L, 5L, 5L, 3L, 5L), .Dim = c(2L,
4L), .Dimnames = list(c("d1", "m1"), c("year2036", "year2037",
"year2038", "year2039")))
year2036 year2037 year2038 year2039
d1 10 11 5 3
m1 10 11 5 5
输出将是这样的:
year2036 year2037 year2038 year2039 year2039
10 11 5 3 5
out<-structure(c(10, 11, 5, 3, 5), .Names = c("year2036", "year2037",
"year2038", "year2039", "year2039"))
我试过 unique(r[c(1:8)])
但它只是给出了唯一的数字,删除了列名。
data %>%
as_tibble() %>%
pivot_longer(everything()) %>%
group_by(name) %>%
distinct(value)
# A tibble: 5 x 2
# Groups: name [4]
name value
<chr> <int>
1 year2036 10
2 year2037 11
3 year2038 5
4 year2039 3
5 year2039 5
您可以在 apply
中使用 unique
和 stack
结果。
stack(apply(data, 2, unique))
# values ind
#1 10 year2036
#2 11 year2037
#3 5 year2038
#4 3 year2039
#5 5 year2039
或您想要的格式:
x <- stack(apply(data, 2, unique))
setNames(x$values, x$ind)
#year2036 year2037 year2038 year2039 year2039
# 10 11 5 3 5
让数据具有相同的列名不是一个好习惯。这是一个解决方案,它提供与您预期的输出相同的结构,但经过修改
列名。
library(dplyr)
library(tidyr)
data %>%
as.data.frame() %>%
pivot_longer(cols = everything()) %>%
distinct() %>%
mutate(row = data.table::rowid(name)) %>%
pivot_wider(names_from = c(name, row), values_from = value)
# year2036_1 year2037_1 year2038_1 year2039_1 year2039_2
# <int> <int> <int> <int> <int>
#1 10 11 5 3 5
使用 collapse
中的 dapply
library(collapse)
stack(dapply(data, MARGIN = 2, FUN = funique))
values ind
1 10 year2036
2 11 year2037
3 5 year2038
4 3 year2039
5 5 <NA>
我有一个包含两列的数据,这些列上有一些重复项。我想删除每列的重复项,然后收集所有保留列名的唯一值。
data<-structure(c(10L, 10L, 11L, 11L, 5L, 5L, 3L, 5L), .Dim = c(2L,
4L), .Dimnames = list(c("d1", "m1"), c("year2036", "year2037",
"year2038", "year2039")))
year2036 year2037 year2038 year2039
d1 10 11 5 3
m1 10 11 5 5
输出将是这样的:
year2036 year2037 year2038 year2039 year2039
10 11 5 3 5
out<-structure(c(10, 11, 5, 3, 5), .Names = c("year2036", "year2037",
"year2038", "year2039", "year2039"))
我试过 unique(r[c(1:8)])
但它只是给出了唯一的数字,删除了列名。
data %>%
as_tibble() %>%
pivot_longer(everything()) %>%
group_by(name) %>%
distinct(value)
# A tibble: 5 x 2
# Groups: name [4]
name value
<chr> <int>
1 year2036 10
2 year2037 11
3 year2038 5
4 year2039 3
5 year2039 5
您可以在 apply
中使用 unique
和 stack
结果。
stack(apply(data, 2, unique))
# values ind
#1 10 year2036
#2 11 year2037
#3 5 year2038
#4 3 year2039
#5 5 year2039
或您想要的格式:
x <- stack(apply(data, 2, unique))
setNames(x$values, x$ind)
#year2036 year2037 year2038 year2039 year2039
# 10 11 5 3 5
让数据具有相同的列名不是一个好习惯。这是一个解决方案,它提供与您预期的输出相同的结构,但经过修改 列名。
library(dplyr)
library(tidyr)
data %>%
as.data.frame() %>%
pivot_longer(cols = everything()) %>%
distinct() %>%
mutate(row = data.table::rowid(name)) %>%
pivot_wider(names_from = c(name, row), values_from = value)
# year2036_1 year2037_1 year2038_1 year2039_1 year2039_2
# <int> <int> <int> <int> <int>
#1 10 11 5 3 5
使用 collapse
dapply
library(collapse)
stack(dapply(data, MARGIN = 2, FUN = funique))
values ind
1 10 year2036
2 11 year2037
3 5 year2038
4 3 year2039
5 5 <NA>