如何在 R 中拆分列名并删除部分名称以及将数据从宽格式转换为长格式
How to split column names and drop parts of the names and convert data from wide to long format in R
我有以下格式的数据:
dataset <- data.frame(taxa = c("k__Archaea| p__Crenarchaeota", "k__Archaea| p__Euryarchaeota", "k__Bacteria| p__[Thermi]"),
"11908.MM.0008.Inf.6m.Stool" =c(0,1760,0),
"11908.MM.01115.Inf.6m.Stool" =c(0,1517,0),
"11908.MM.0044.Inf.6m.Stool" =c(0,10815,0),
"11908.MM.0125.Mom.6m.Stool" = c(0,4719,0))
view(dataset)
我想将其转换为以下格式:
fix_dataset <- data.frame(study_id = c(0008, 0115, 0044, 0125),
individual = c("Inf", "Inf", "Inf", "Mom" ),
`k__Archaea| p__Crenarchaeota` = c(0,0,0,0),
`k__Archaea| p__Euryarchaeota`= c(1760, 1517,10815, 4719),
`` = c(0,0,0,0),
timept1 = c("6m", "6m", "6m", "6m"))
view(fix_dataset)
我正在尝试从每个列名中删除开头的数字系列 11908 和“Stool”,拆分出列名的其他部分并将宽格式转换为长格式。
我正在使用以下代码
library(tidyverse)
dataset %>%
pivot_longer(cols = -taxa) %>%
separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
pivot_wider(names_from = taxa,
values_from = value) %>%
select(study_id, individual, starts_with("k_"), timept1)
当我将此应用到我的数据时收到以下错误消息
Error in select(., study_id, individual, timept1, starts_with("k_")) :
unused arguments (study_id, individual, timept1, starts_with("k_"))
In addition: Warning messages:
1: Expected 6 pieces. Additional pieces discarded in 44 rows [242, 243, 903, 904, 1564, 1565, 2225, 2226, 2886, 2887, 3547, 3548, 4208, 4209, 4869, 4870, 5530, 5531, 6191, 6192, ...].
2: Expected 6 pieces. Missing pieces filled with `NA` in 1012 rows [74, 93, 94, 223, 224, 225, 226, 227, 228, 229, 230, 469, 470, 532, 533, 535, 536, 540, 580, 593, ...].
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
有人对这些错误消息有建议吗?
您可以使用以下代码实现:
library(tidyverse)
dataset %>%
pivot_longer(cols = -taxa) %>%
separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
pivot_wider(names_from = taxa,
values_from = value) %>%
select(study_id, individual, starts_with("taxa"), timept1)
给出:
# A tibble: 4 x 6
study_id individual taxa1 taxa2 taxa3 timept1
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 0008 Inf 0 1760 0 6m
2 01115 Inf 0 1517 0 6m
3 0044 Inf 0 10815 0 6m
4 0125 Mom 0 4719 0 6m
请注意,您的研究 ID 存在一些不一致,即在您的原始数据集中,其中一个 ID 是“01115”,而在您的首选输出中它是“0115”。
我有以下格式的数据:
dataset <- data.frame(taxa = c("k__Archaea| p__Crenarchaeota", "k__Archaea| p__Euryarchaeota", "k__Bacteria| p__[Thermi]"),
"11908.MM.0008.Inf.6m.Stool" =c(0,1760,0),
"11908.MM.01115.Inf.6m.Stool" =c(0,1517,0),
"11908.MM.0044.Inf.6m.Stool" =c(0,10815,0),
"11908.MM.0125.Mom.6m.Stool" = c(0,4719,0))
view(dataset)
我想将其转换为以下格式:
fix_dataset <- data.frame(study_id = c(0008, 0115, 0044, 0125),
individual = c("Inf", "Inf", "Inf", "Mom" ),
`k__Archaea| p__Crenarchaeota` = c(0,0,0,0),
`k__Archaea| p__Euryarchaeota`= c(1760, 1517,10815, 4719),
`` = c(0,0,0,0),
timept1 = c("6m", "6m", "6m", "6m"))
view(fix_dataset)
我正在尝试从每个列名中删除开头的数字系列 11908 和“Stool”,拆分出列名的其他部分并将宽格式转换为长格式。
我正在使用以下代码
library(tidyverse)
dataset %>%
pivot_longer(cols = -taxa) %>%
separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
pivot_wider(names_from = taxa,
values_from = value) %>%
select(study_id, individual, starts_with("k_"), timept1)
当我将此应用到我的数据时收到以下错误消息
Error in select(., study_id, individual, timept1, starts_with("k_")) :
unused arguments (study_id, individual, timept1, starts_with("k_"))
In addition: Warning messages:
1: Expected 6 pieces. Additional pieces discarded in 44 rows [242, 243, 903, 904, 1564, 1565, 2225, 2226, 2886, 2887, 3547, 3548, 4208, 4209, 4869, 4870, 5530, 5531, 6191, 6192, ...].
2: Expected 6 pieces. Missing pieces filled with `NA` in 1012 rows [74, 93, 94, 223, 224, 225, 226, 227, 228, 229, 230, 469, 470, 532, 533, 535, 536, 540, 580, 593, ...].
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
有人对这些错误消息有建议吗?
您可以使用以下代码实现:
library(tidyverse)
dataset %>%
pivot_longer(cols = -taxa) %>%
separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
pivot_wider(names_from = taxa,
values_from = value) %>%
select(study_id, individual, starts_with("taxa"), timept1)
给出:
# A tibble: 4 x 6
study_id individual taxa1 taxa2 taxa3 timept1
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 0008 Inf 0 1760 0 6m
2 01115 Inf 0 1517 0 6m
3 0044 Inf 0 10815 0 6m
4 0125 Mom 0 4719 0 6m
请注意,您的研究 ID 存在一些不一致,即在您的原始数据集中,其中一个 ID 是“01115”,而在您的首选输出中它是“0115”。