如何重塑R中数据的一部分
how reshape one part of the data in R
我有一个如下所示的数据集:
df <- tibble::tribble(
~subcateg, ~names,
"A00", "Kidney failure",
"A001", "Kidney failure reason1",
"A002", "Kidney failure reason2",
"A003", "Kidney failure reason3",
"B00", "Heart failure",
"B001", "Heart failure reason1",
"B002", "Heart failure reason2",
"B003", "Heart failure reason3",
"B00", "Lung failure",
"B001", "Lung failure reason1",
"B002", "Lung failure reason2",
"B003", "Lung failure reason3",
)
它在同一个变量中有类别(3个字符)和子类别(4个字符),我需要另一个类别为3个字符的变量。我希望它看起来像这样:
df2 <- tibble::tribble(
~subcateg, ~names, ~categ, ~names2,
"A001", "Kidney failure reason1", "A00", "Kidney failure",
"A002", "Kidney failure reason2","A00", "Kidney failure",
"A003", "Kidney failure reason3","A00", "Kidney failure",
"B001", "Heart failure reason1", "B00", "Heart failure",
"B002", "Heart failure reason2", "B00", "Heart failure",
"B003", "Heart failure reason3", "B00", "Heart failure",
"B001", "Lung failure reason1", "B00", "Lung failure",
"B002", "Lung failure reason2", "B00", "Lung failure",
"B003", "Lung failure reason3", "B00", "Lung failure",
)
有什么想法吗?
非常感谢!
我们根据 'subcateg' 中出现的 3 个字符 (nchar
) 创建一个分组变量,创建 'categ' 作为 first
元素20=],删除第一行 (slice
),然后通过删除 reason
后跟 'names' 列
中的数字子字符串来创建 'names2'
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(nchar(subcateg) == 3)) %>%
mutate(categ = first(subcateg)) %>%
slice(if(n() == 1) 1 else -1) %>%
ungroup %>%
select(-grp) %>%
mutate(names2 = str_remove(names, "\s+reason\d+"))
-输出
# A tibble: 9 × 4
subcateg names categ names2
<chr> <chr> <chr> <chr>
1 A001 Kidney failure reason1 A00 Kidney failure
2 A002 Kidney failure reason2 A00 Kidney failure
3 A003 Kidney failure reason3 A00 Kidney failure
4 B001 Heart failure reason1 B00 Heart failure
5 B002 Heart failure reason2 B00 Heart failure
6 B003 Heart failure reason3 B00 Heart failure
7 B001 Lung failure reason1 B00 Lung failure
8 B002 Lung failure reason2 B00 Lung failure
9 B003 Lung failure reason3 B00 Lung failure
如果肺衰竭 类别以 C(而不是 B)开头——这是一个错误吗? --,另一种解决方案如下:
library(tidyr)
library(dplyr)
df %>%
separate(subcateg,"categ",sep = "[1-9]", extra = "drop", remove = F) %>%
inner_join(df,by=c("categ" = "subcateg"),suffix = c("", "2")) %>%
filter(!stringr::str_ends(subcateg,"00")) %>%
relocate(categ, .after = names)
我有一个如下所示的数据集:
df <- tibble::tribble(
~subcateg, ~names,
"A00", "Kidney failure",
"A001", "Kidney failure reason1",
"A002", "Kidney failure reason2",
"A003", "Kidney failure reason3",
"B00", "Heart failure",
"B001", "Heart failure reason1",
"B002", "Heart failure reason2",
"B003", "Heart failure reason3",
"B00", "Lung failure",
"B001", "Lung failure reason1",
"B002", "Lung failure reason2",
"B003", "Lung failure reason3",
)
它在同一个变量中有类别(3个字符)和子类别(4个字符),我需要另一个类别为3个字符的变量。我希望它看起来像这样:
df2 <- tibble::tribble(
~subcateg, ~names, ~categ, ~names2,
"A001", "Kidney failure reason1", "A00", "Kidney failure",
"A002", "Kidney failure reason2","A00", "Kidney failure",
"A003", "Kidney failure reason3","A00", "Kidney failure",
"B001", "Heart failure reason1", "B00", "Heart failure",
"B002", "Heart failure reason2", "B00", "Heart failure",
"B003", "Heart failure reason3", "B00", "Heart failure",
"B001", "Lung failure reason1", "B00", "Lung failure",
"B002", "Lung failure reason2", "B00", "Lung failure",
"B003", "Lung failure reason3", "B00", "Lung failure",
)
有什么想法吗? 非常感谢!
我们根据 'subcateg' 中出现的 3 个字符 (nchar
) 创建一个分组变量,创建 'categ' 作为 first
元素20=],删除第一行 (slice
),然后通过删除 reason
后跟 'names' 列
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(nchar(subcateg) == 3)) %>%
mutate(categ = first(subcateg)) %>%
slice(if(n() == 1) 1 else -1) %>%
ungroup %>%
select(-grp) %>%
mutate(names2 = str_remove(names, "\s+reason\d+"))
-输出
# A tibble: 9 × 4
subcateg names categ names2
<chr> <chr> <chr> <chr>
1 A001 Kidney failure reason1 A00 Kidney failure
2 A002 Kidney failure reason2 A00 Kidney failure
3 A003 Kidney failure reason3 A00 Kidney failure
4 B001 Heart failure reason1 B00 Heart failure
5 B002 Heart failure reason2 B00 Heart failure
6 B003 Heart failure reason3 B00 Heart failure
7 B001 Lung failure reason1 B00 Lung failure
8 B002 Lung failure reason2 B00 Lung failure
9 B003 Lung failure reason3 B00 Lung failure
如果肺衰竭 类别以 C(而不是 B)开头——这是一个错误吗? --,另一种解决方案如下:
library(tidyr)
library(dplyr)
df %>%
separate(subcateg,"categ",sep = "[1-9]", extra = "drop", remove = F) %>%
inner_join(df,by=c("categ" = "subcateg"),suffix = c("", "2")) %>%
filter(!stringr::str_ends(subcateg,"00")) %>%
relocate(categ, .after = names)