取消嵌套不同大小的相关列表列
Unnesting related list-columns of different size
解析 xml 个文件后,我的数据如下所示:
example_df <-
tibble(id = "ABC",
wage_type = "salary",
name = c("Description","Code","Base",
"Description","Code","Base",
"Description","Code"),
value = c("wage_element_1","51B","600",
"wage_element_2","51C","740",
"wage_element_3","51D"))
example_df
# A tibble: 8 x 4
id wage_type name value
<chr> <chr> <chr> <chr>
1 ABC salary Description wage_element_1
2 ABC salary Code 51B
3 ABC salary Base 600
4 ABC salary Description wage_element_2
5 ABC salary Code 51C
6 ABC salary Base 740
7 ABC salary Description wage_element_3
8 ABC salary Code 51D
大约有 1000 个不同的 id
,并且每个 wage_type
都有三个可能的值。
我想将 name
列中的值更改为列。
我尝试使用 pivot
但我正在努力处理结果 list-cols
:因为并非所有 salary
都有 Base
,结果列表列的大小不同下面:
example_df <- example_df %>%
pivot_wider(id_cols = c(id, wage_type),
names_from = name,
values_from = value)
example_df
# A tibble: 1 x 5
id wage_type Description Code Base
<chr> <chr> <list> <list> <list>
1 ABC salary <chr [3]> <chr [3]> <chr [2]>
因此,当我尝试取消嵌套列时,它会抛出一个错误:
example_df%>%
unnest(cols = c(Description,Code,Base))
Error: Can't recycle `Description` (size 3) to match `Base` (size 2).
我知道这是因为 tidyr 函数不回收,但我找不到解决这个问题的方法或 base r
解决我的问题的方法。我试图用
unlist(strsplit(as.character(x))
根据 解决方案,但也 运行 导致列长度问题。
期望的输出如下:
desired_df <-
tibble(
id=c("ABC","ABC","ABC"),
wage_type=c("salary","salary","salary"),
Description = c("wage_element_1","wage_element_2","wage_element_3"),
Code = c("51B","51C","51D"),
Base = c("600","740",NA))
desired_df
id wage_type Description Code Base
<chr> <chr> <chr> <chr> <chr>
1 ABC salary wage_element_1 51B 600
2 ABC salary wage_element_2 51C 740
3 ABC salary wage_element_3 51D NA
我想要一个 tidyr 解决方案,但我们将不胜感激。谢谢。
我建议使用 tidyverse
函数来使用这种方法。您遇到的问题是由于函数如何管理不同的行。因此,通过创建像 id2
这样的 id 变量,您可以避免在最终重塑数据中出现列表输出:
library(tidyverse)
#Code
example_df %>%
arrange(name) %>%
group_by(id,wage_type,name) %>%
mutate(id2=1:n()) %>% ungroup() %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-id2)
输出:
# A tibble: 3 x 5
id wage_type Base Code Description
<chr> <chr> <chr> <chr> <chr>
1 ABC salary 600 51B wage_element_1
2 ABC salary 740 51C wage_element_2
3 ABC salary NA 51D wage_element_3
解析 xml 个文件后,我的数据如下所示:
example_df <-
tibble(id = "ABC",
wage_type = "salary",
name = c("Description","Code","Base",
"Description","Code","Base",
"Description","Code"),
value = c("wage_element_1","51B","600",
"wage_element_2","51C","740",
"wage_element_3","51D"))
example_df
# A tibble: 8 x 4
id wage_type name value
<chr> <chr> <chr> <chr>
1 ABC salary Description wage_element_1
2 ABC salary Code 51B
3 ABC salary Base 600
4 ABC salary Description wage_element_2
5 ABC salary Code 51C
6 ABC salary Base 740
7 ABC salary Description wage_element_3
8 ABC salary Code 51D
大约有 1000 个不同的 id
,并且每个 wage_type
都有三个可能的值。
我想将 name
列中的值更改为列。
我尝试使用 pivot
但我正在努力处理结果 list-cols
:因为并非所有 salary
都有 Base
,结果列表列的大小不同下面:
example_df <- example_df %>%
pivot_wider(id_cols = c(id, wage_type),
names_from = name,
values_from = value)
example_df
# A tibble: 1 x 5
id wage_type Description Code Base
<chr> <chr> <list> <list> <list>
1 ABC salary <chr [3]> <chr [3]> <chr [2]>
因此,当我尝试取消嵌套列时,它会抛出一个错误:
example_df%>%
unnest(cols = c(Description,Code,Base))
Error: Can't recycle `Description` (size 3) to match `Base` (size 2).
我知道这是因为 tidyr 函数不回收,但我找不到解决这个问题的方法或 base r
解决我的问题的方法。我试图用
unlist(strsplit(as.character(x))
根据
期望的输出如下:
desired_df <-
tibble(
id=c("ABC","ABC","ABC"),
wage_type=c("salary","salary","salary"),
Description = c("wage_element_1","wage_element_2","wage_element_3"),
Code = c("51B","51C","51D"),
Base = c("600","740",NA))
desired_df
id wage_type Description Code Base
<chr> <chr> <chr> <chr> <chr>
1 ABC salary wage_element_1 51B 600
2 ABC salary wage_element_2 51C 740
3 ABC salary wage_element_3 51D NA
我想要一个 tidyr 解决方案,但我们将不胜感激。谢谢。
我建议使用 tidyverse
函数来使用这种方法。您遇到的问题是由于函数如何管理不同的行。因此,通过创建像 id2
这样的 id 变量,您可以避免在最终重塑数据中出现列表输出:
library(tidyverse)
#Code
example_df %>%
arrange(name) %>%
group_by(id,wage_type,name) %>%
mutate(id2=1:n()) %>% ungroup() %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-id2)
输出:
# A tibble: 3 x 5
id wage_type Base Code Description
<chr> <chr> <chr> <chr> <chr>
1 ABC salary 600 51B wage_element_1
2 ABC salary 740 51C wage_element_2
3 ABC salary NA 51D wage_element_3