从 XML 转换的 table 重塑数据

Reshaping Data from an XML converted table

我有一个从 XML 转换为 csv 的 data-frame。现在的问题是我需要从这些数据中得到一个 excel sheet 但数据完全是一团糟。我想知道你是否可以帮助找到解决问题的R代码。

问题我详细解释一下。想象一下,如果数据集是这样的:

student.data <- data.frame(id = c(1:17),
                       student_id = c(1111,"","","","","","","","","2222","","","","","","",""),
                       exam_id =c("",10,10,20,20,20,30,40,40,"",10,10,10,20,30,40,40), 
                       status = c("","AAA","BBB","CCC","DDD","FFF","GGG","AAA","GGG","","BBB","HHH","MMM","FFF","DDD","GGG","GGG"))

结果必须是:

我知道这有点复杂,但提前感谢您的帮助。

我们可以转换空白("")。 'student_id' 到 NA (na_if) 中的元素,然后使用 fill 将 NA 元素替换为相邻的非 NA 元素,按 'student_id'、'exam_id', 获取 'status' 的 unique 个不是空白的元素 ("") 并将其粘贴到单个字符串 (toString), filter 删除任何有空白的行,并使用 pivot_wider

将输出重塑为 'wide' 格式
library(dplyr)
library(tidyr)
library(purrr)
student.data %>% 
  mutate(student_id = na_if(student_id, "")) %>%
 fill(student_id) %>%
 group_by(student_id, exam_id) %>% 
 summarise(status  =  toString(unique(status[status!= '']))) %>% 
 filter_at(vars(exam_id, status), any_vars(. != '')) %>% 
 pivot_wider(names_from = exam_id, values_from = status)
# A tibble: 2 x 5
# Groups:   student_id [3]
#  student_id `10`          `20`          `30`  `40`    
#  <fct>      <chr>         <chr>         <chr> <chr>   
#1 1111       AAA, BBB      CCC, DDD, FFF GGG   AAA, GGG
#2 2222       BBB, HHH, MMM FFF           DDD   GGG