从 XML 转换的 table 重塑数据
Reshaping Data from an XML converted table
我有一个从 XML 转换为 csv 的 data-frame。现在的问题是我需要从这些数据中得到一个 excel sheet 但数据完全是一团糟。我想知道你是否可以帮助找到解决问题的R代码。
问题我详细解释一下。想象一下,如果数据集是这样的:
student.data <- data.frame(id = c(1:17),
student_id = c(1111,"","","","","","","","","2222","","","","","","",""),
exam_id =c("",10,10,20,20,20,30,40,40,"",10,10,10,20,30,40,40),
status = c("","AAA","BBB","CCC","DDD","FFF","GGG","AAA","GGG","","BBB","HHH","MMM","FFF","DDD","GGG","GGG"))
结果必须是:
我知道这有点复杂,但提前感谢您的帮助。
我们可以转换空白(""
)。 'student_id' 到 NA
(na_if
) 中的元素,然后使用 fill
将 NA 元素替换为相邻的非 NA 元素,按 'student_id'、'exam_id', 获取 'status' 的 unique
个不是空白的元素 (""
) 并将其粘贴到单个字符串 (toString
), filter
删除任何有空白的行,并使用 pivot_wider
将输出重塑为 'wide' 格式
library(dplyr)
library(tidyr)
library(purrr)
student.data %>%
mutate(student_id = na_if(student_id, "")) %>%
fill(student_id) %>%
group_by(student_id, exam_id) %>%
summarise(status = toString(unique(status[status!= '']))) %>%
filter_at(vars(exam_id, status), any_vars(. != '')) %>%
pivot_wider(names_from = exam_id, values_from = status)
# A tibble: 2 x 5
# Groups: student_id [3]
# student_id `10` `20` `30` `40`
# <fct> <chr> <chr> <chr> <chr>
#1 1111 AAA, BBB CCC, DDD, FFF GGG AAA, GGG
#2 2222 BBB, HHH, MMM FFF DDD GGG
我有一个从 XML 转换为 csv 的 data-frame。现在的问题是我需要从这些数据中得到一个 excel sheet 但数据完全是一团糟。我想知道你是否可以帮助找到解决问题的R代码。
问题我详细解释一下。想象一下,如果数据集是这样的:
student.data <- data.frame(id = c(1:17),
student_id = c(1111,"","","","","","","","","2222","","","","","","",""),
exam_id =c("",10,10,20,20,20,30,40,40,"",10,10,10,20,30,40,40),
status = c("","AAA","BBB","CCC","DDD","FFF","GGG","AAA","GGG","","BBB","HHH","MMM","FFF","DDD","GGG","GGG"))
结果必须是:
我们可以转换空白(""
)。 'student_id' 到 NA
(na_if
) 中的元素,然后使用 fill
将 NA 元素替换为相邻的非 NA 元素,按 'student_id'、'exam_id', 获取 'status' 的 unique
个不是空白的元素 (""
) 并将其粘贴到单个字符串 (toString
), filter
删除任何有空白的行,并使用 pivot_wider
library(dplyr)
library(tidyr)
library(purrr)
student.data %>%
mutate(student_id = na_if(student_id, "")) %>%
fill(student_id) %>%
group_by(student_id, exam_id) %>%
summarise(status = toString(unique(status[status!= '']))) %>%
filter_at(vars(exam_id, status), any_vars(. != '')) %>%
pivot_wider(names_from = exam_id, values_from = status)
# A tibble: 2 x 5
# Groups: student_id [3]
# student_id `10` `20` `30` `40`
# <fct> <chr> <chr> <chr> <chr>
#1 1111 AAA, BBB CCC, DDD, FFF GGG AAA, GGG
#2 2222 BBB, HHH, MMM FFF DDD GGG