如何合并 R 中的重复行?
How to combine duplicate rows in R?
我正在使用 R 创建一个包含多个重复列的数据框。我想将所有重复的列合并为一个列。如何在 R 中执行此操作?
注意 1:当我构建具有多列的数据框时,R 默认会在重复列的名称中添加数字。
注意 2:我正在寻找可与列一起使用的代码,无论它们的顺序如何。
代码:
# Create the data frame.
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
当前结果
emp_name salary salary.1 emp_name.1
Rick 623.3 700.3 Kevin
Dan 515.20 600.2 Tracey
Michelle 611.00 721.0 Thompson
Ryan 729.00 730.5 Peter
Gary 843.25 845.4 Bevan
预期输出
emp_name salary
Rick 623.3
Dan 515.20
Michelle 611.00
Ryan 729.00
Gary 843.25
Kevin 700.3
Tracey 600.2
Thompson 721.0
Peter 730.5
Bevan 845.4
我不是这个解决方案的超级粉丝,但它有效并且应该扩展。
emp.data %>%
pivot_longer(cols = contains("emp_name"), names_to = "ename", values_to = "emp_name") %>%
pivot_longer(cols = contains("salary"), names_to = "sname", values_to = "salary") %>%
filter(replace_na(parse_number(ename),0) == replace_na(parse_number(sname), 0)) %>%
select(-ename, -sname)
注意 parse_number
需要 tidyr
您可以使用 split.default
根据列名称将数据拆分为数据帧列表。 unlist
将每个数据帧转换为向量,然后您可以从中创建一个列。
data.frame(lapply(split.default(emp.data, names(emp.data)), unlist), row.names = NULL)
# emp_name salary
#1 Rick 623.30
#2 Dan 515.20
#3 Michelle 611.00
#4 Ryan 729.00
#5 Gary 843.25
#6 Kevin 700.30
#7 Tracy 600.20
#8 Thompson 721.00
#9 Peter 730.50
#10 Bevan 845.40
使用 pivot_longer
来自 tidyr
-
的另一个选项
tidyr::pivot_longer(emp.data, cols = everything(), names_to = '.value')
数据
要创建具有相同列名的数据框,您可以在 data.frame
调用中添加 check.names = FALSE
。
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE, check.names = FALSE
)
使用 data.table
中的 melt
library(data.table)
melt(setDT(emp.data), measure = patterns("^emp_name", "salary"),
value.name = c("emp_name", "salary"))[, variable := NULL][]
emp_name salary
1: Rick 623.30
2: Dan 515.20
3: Michelle 611.00
4: Ryan 729.00
5: Gary 843.25
6: Kevin 700.30
7: Tracy 600.20
8: Thompson 721.00
9: Peter 730.50
10: Bevan 845.40
数据
emp.data <- structure(list(emp_name = c("Rick", "Dan", "Michelle", "Ryan",
"Gary"), salary = c(623.3, 515.2, 611, 729, 843.25), salary = c(700.3,
600.2, 721, 730.5, 845.4), emp_name = c("Kevin", "Tracy", "Thompson",
"Peter", "Bevan")), class = "data.frame", row.names = c(NA, -5L
))
我正在使用 R 创建一个包含多个重复列的数据框。我想将所有重复的列合并为一个列。如何在 R 中执行此操作?
注意 1:当我构建具有多列的数据框时,R 默认会在重复列的名称中添加数字。
注意 2:我正在寻找可与列一起使用的代码,无论它们的顺序如何。
代码:
# Create the data frame.
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
当前结果
emp_name salary salary.1 emp_name.1
Rick 623.3 700.3 Kevin
Dan 515.20 600.2 Tracey
Michelle 611.00 721.0 Thompson
Ryan 729.00 730.5 Peter
Gary 843.25 845.4 Bevan
预期输出
emp_name salary
Rick 623.3
Dan 515.20
Michelle 611.00
Ryan 729.00
Gary 843.25
Kevin 700.3
Tracey 600.2
Thompson 721.0
Peter 730.5
Bevan 845.4
我不是这个解决方案的超级粉丝,但它有效并且应该扩展。
emp.data %>%
pivot_longer(cols = contains("emp_name"), names_to = "ename", values_to = "emp_name") %>%
pivot_longer(cols = contains("salary"), names_to = "sname", values_to = "salary") %>%
filter(replace_na(parse_number(ename),0) == replace_na(parse_number(sname), 0)) %>%
select(-ename, -sname)
注意 parse_number
tidyr
您可以使用 split.default
根据列名称将数据拆分为数据帧列表。 unlist
将每个数据帧转换为向量,然后您可以从中创建一个列。
data.frame(lapply(split.default(emp.data, names(emp.data)), unlist), row.names = NULL)
# emp_name salary
#1 Rick 623.30
#2 Dan 515.20
#3 Michelle 611.00
#4 Ryan 729.00
#5 Gary 843.25
#6 Kevin 700.30
#7 Tracy 600.20
#8 Thompson 721.00
#9 Peter 730.50
#10 Bevan 845.40
使用 pivot_longer
来自 tidyr
-
tidyr::pivot_longer(emp.data, cols = everything(), names_to = '.value')
数据
要创建具有相同列名的数据框,您可以在 data.frame
调用中添加 check.names = FALSE
。
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE, check.names = FALSE
)
使用 data.table
melt
library(data.table)
melt(setDT(emp.data), measure = patterns("^emp_name", "salary"),
value.name = c("emp_name", "salary"))[, variable := NULL][]
emp_name salary
1: Rick 623.30
2: Dan 515.20
3: Michelle 611.00
4: Ryan 729.00
5: Gary 843.25
6: Kevin 700.30
7: Tracy 600.20
8: Thompson 721.00
9: Peter 730.50
10: Bevan 845.40
数据
emp.data <- structure(list(emp_name = c("Rick", "Dan", "Michelle", "Ryan",
"Gary"), salary = c(623.3, 515.2, 611, 729, 843.25), salary = c(700.3,
600.2, 721, 730.5, 845.4), emp_name = c("Kevin", "Tracy", "Thompson",
"Peter", "Bevan")), class = "data.frame", row.names = c(NA, -5L
))