如何合并 R 中的重复行?

How to combine duplicate rows in R?

我正在使用 R 创建一个包含多个重复列的数据框。我想将所有重复的列合并为一个列。如何在 R 中执行此操作?

注意 1:当我构建具有多列的数据框时,R 默认会在重复列的名称中添加数字。

注意 2:我正在寻找可与列一起使用的代码,无论它们的顺序如何。

代码:

# Create the data frame.
emp.data <- data.frame(
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  salary = c(700.3,600.2,721.0,730.5,845.4), 
  emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
  stringsAsFactors = FALSE
)
# Print the data frame.         
print(emp.data)

当前结果

         emp_name   salary   salary.1 emp_name.1
         Rick       623.3    700.3    Kevin
         Dan        515.20   600.2    Tracey
         Michelle   611.00   721.0    Thompson
         Ryan       729.00   730.5    Peter
         Gary       843.25   845.4    Bevan

预期输出

       emp_name   salary   
         Rick       623.3    
         Dan        515.20   
         Michelle   611.00   
         Ryan       729.00   
         Gary       843.25   
         Kevin      700.3
         Tracey     600.2
         Thompson   721.0
         Peter      730.5
         Bevan      845.4

我不是这个解决方案的超级粉丝,但它有效并且应该扩展。

emp.data %>%
  pivot_longer(cols = contains("emp_name"), names_to = "ename", values_to = "emp_name") %>%
  pivot_longer(cols = contains("salary"), names_to = "sname", values_to  = "salary") %>%
  filter(replace_na(parse_number(ename),0) == replace_na(parse_number(sname), 0)) %>%
  select(-ename, -sname)

注意 parse_number

需要 tidyr

您可以使用 split.default 根据列名称将数据拆分为数据帧列表。 unlist 将每个数据帧转换为向量,然后您可以从中创建一个列。

data.frame(lapply(split.default(emp.data, names(emp.data)), unlist), row.names = NULL)

#   emp_name salary
#1      Rick 623.30
#2       Dan 515.20
#3  Michelle 611.00
#4      Ryan 729.00
#5      Gary 843.25
#6     Kevin 700.30
#7     Tracy 600.20
#8  Thompson 721.00
#9     Peter 730.50
#10    Bevan 845.40

使用 pivot_longer 来自 tidyr -

的另一个选项
tidyr::pivot_longer(emp.data, cols = everything(), names_to = '.value')

数据

要创建具有相同列名的数据框,您可以在 data.frame 调用中添加 check.names = FALSE

emp.data <- data.frame(
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  salary = c(700.3,600.2,721.0,730.5,845.4), 
  emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
  stringsAsFactors = FALSE, check.names = FALSE
)

使用 data.table

中的 melt
library(data.table)
melt(setDT(emp.data), measure = patterns("^emp_name", "salary"),
     value.name = c("emp_name", "salary"))[, variable := NULL][]
    emp_name salary
 1:     Rick 623.30
 2:      Dan 515.20
 3: Michelle 611.00
 4:     Ryan 729.00
 5:     Gary 843.25
 6:    Kevin 700.30
 7:    Tracy 600.20
 8: Thompson 721.00
 9:    Peter 730.50
10:    Bevan 845.40

数据

emp.data <- structure(list(emp_name = c("Rick", "Dan", "Michelle", "Ryan", 
"Gary"), salary = c(623.3, 515.2, 611, 729, 843.25), salary = c(700.3, 
600.2, 721, 730.5, 845.4), emp_name = c("Kevin", "Tracy", "Thompson", 
"Peter", "Bevan")), class = "data.frame", row.names = c(NA, -5L
))