如何根据 R 中的列内容将列名拉入新列
How to pull column name into a new column based on column contents in R
我需要做三个动作:
1.计算 table 中的行向非 NA 值并对它们求和(在单列“check_na”)
[我把我的解决方案放在下面,如果有人能弄清楚如何用地图做到这一点,我很感兴趣。我已经检查了 以获得答案]
2。对于那些不是 NA 的值,创建一个列,将这些 unique 值连接到新列“block detail”中。
[我不知道该怎么做]
3。如果“check_na”有一个值,则拉入列名称并将它们连接到一个新列(“块类型”)
[我不知道该怎么做]
这就是最终产品的样子。
请注意,在第 2 行中,即使“b”出现两次,它只在“块详细信息”中出现一次,但包含它的列单独列出“y|z”
w x y z na_check block_detail block_type
<dbl> <chr> <chr> <chr> <int> <chr> <chr>
1 NA a NA NA 1 a x
2 NA NA b b 2 b y|z
3 NA NA b c 2 b|c y|z
4 NA NA NA NA 0 NA NA
5 NA NA NA b 1 b z
下面是示例数据和我对第 1 部分的解决方案:
#sample data
df <- tibble(w=rep(NA_real_,5),
x=c(1,rep(NA_real_,4)),
y=c(NA_real_,1,rep(NA_real_,3)),
z=c(NA_real_,1,rep(NA_real_,2),1)
)
#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with
df_na_check <- df %>%
mutate(across(everything(),
list(na_check=~!is.na(.)),
.names="{.col}_{.fn}")) %>%
rowwise() %>%
mutate(na_check=sum(c_across(contains("na_check")))) %>%
select(w:z,na_check)
感谢您的帮助。理想情况下,如果解决方案可以使用 tidyverse 但对其他方法开放(data.table 或 base r)
下面的怎么样:
library(tidyverse)
df <- tibble(w=rep(NA_real_,5),
x=c("a",rep(NA_character_,4)),
y=c(NA_character_, "b", "b", NA_character_, NA_character_),
z=c(NA_character_, "b", "c", NA_character_, "b"))
paste_no_na <- function(x){
x <- unique(x)
paste0(x[!is.na(x)], collapse = "|")
}
df_summary <- df %>%
mutate(row = row_number(),
w = as.character(w)) %>%
pivot_longer(cols = -row) %>%
mutate(name = if_else(is.na(value), NA_character_, name)) %>%
group_by(row) %>%
summarize(na_check = sum(!is.na(value)),
block_detail = paste_no_na(value),
block_type = paste_no_na(name))
cbind(df, df_summary %>% select(-row))
基本思路是将数据透视更长的时间,这样我们就可以分析每个组(即原始行),然后进行总结。如果你有很多行,性能可能会变差。检查您是否属于这种情况。
我们可以先用rowSums
得到不是NA
的列数。然后,我们可以使用purrr
将没有NA
的唯一字符折叠成block_detail
。然后,我们可以使用 apply
遍历每一行以获取没有 block_type
.
的 NA
的列名
library(tidyverse)
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T),
block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))
输出
w x y z na_check block_detail block_type
1 NA a <NA> <NA> 1 a x
2 NA <NA> b b 2 b y|z
3 NA <NA> b c 2 b|c y|z
4 NA <NA> <NA> <NA> 0
5 NA <NA> <NA> b 1 b z
或使用 purrr
而不是 apply
:
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T),
block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>%
mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))
数据
df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA,
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA,
"b")), class = "data.frame", row.names = c(NA, -5L))
这个回答主要是用apply
遍历dataframerow-wise,最后用unite
组合非NA
.
的colnames
library(tidyverse)
df %>%
mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>%
unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")
w x y z na_check block_type block_detail
1 NA a <NA> <NA> 1 x a
2 NA <NA> b b 2 y|z b
3 NA <NA> b c 2 y|z b|c
4 NA <NA> <NA> <NA> 0
5 NA <NA> <NA> b 1 z b
数据
感谢@AndrewGB dput
+1!
df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA,
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA,
"b")), class = "data.frame", row.names = c(NA, -5L))
这里是mutate
和unite
的组合:
library(dplyr)
library(tidyr)
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T)) %>%
unite("block_detail", w:z, na.rm = TRUE, sep = "|", remove = FALSE) %>%
mutate(across(-c(block_detail, na_check), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
unite(block_type, starts_with('new'), na.rm = TRUE, sep = '|') %>%
relocate(block_detail, .after = na_check)
w x y z na_check block_detail block_type
1 <NA> a <NA> <NA> 1 a x
2 <NA> <NA> b b 2 b|b y|z
3 <NA> <NA> b c 2 b|c y|z
4 <NA> <NA> <NA> <NA> 0
5 <NA> <NA> <NA> b 1 b z
数据:
df <- structure(list(w = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), x = c("a", NA, NA, NA, NA), y = c(NA,
"b", "b", NA, NA), z = c(NA, "b", "c", NA, "b")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
我需要做三个动作:
1.计算 table 中的行向非 NA 值并对它们求和(在单列“check_na”)
[我把我的解决方案放在下面,如果有人能弄清楚如何用地图做到这一点,我很感兴趣。我已经检查了
2。对于那些不是 NA 的值,创建一个列,将这些 unique 值连接到新列“block detail”中。
[我不知道该怎么做]
3。如果“check_na”有一个值,则拉入列名称并将它们连接到一个新列(“块类型”)
[我不知道该怎么做]
这就是最终产品的样子。 请注意,在第 2 行中,即使“b”出现两次,它只在“块详细信息”中出现一次,但包含它的列单独列出“y|z”
w x y z na_check block_detail block_type
<dbl> <chr> <chr> <chr> <int> <chr> <chr>
1 NA a NA NA 1 a x
2 NA NA b b 2 b y|z
3 NA NA b c 2 b|c y|z
4 NA NA NA NA 0 NA NA
5 NA NA NA b 1 b z
下面是示例数据和我对第 1 部分的解决方案:
#sample data
df <- tibble(w=rep(NA_real_,5),
x=c(1,rep(NA_real_,4)),
y=c(NA_real_,1,rep(NA_real_,3)),
z=c(NA_real_,1,rep(NA_real_,2),1)
)
#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with
df_na_check <- df %>%
mutate(across(everything(),
list(na_check=~!is.na(.)),
.names="{.col}_{.fn}")) %>%
rowwise() %>%
mutate(na_check=sum(c_across(contains("na_check")))) %>%
select(w:z,na_check)
感谢您的帮助。理想情况下,如果解决方案可以使用 tidyverse 但对其他方法开放(data.table 或 base r)
下面的怎么样:
library(tidyverse)
df <- tibble(w=rep(NA_real_,5),
x=c("a",rep(NA_character_,4)),
y=c(NA_character_, "b", "b", NA_character_, NA_character_),
z=c(NA_character_, "b", "c", NA_character_, "b"))
paste_no_na <- function(x){
x <- unique(x)
paste0(x[!is.na(x)], collapse = "|")
}
df_summary <- df %>%
mutate(row = row_number(),
w = as.character(w)) %>%
pivot_longer(cols = -row) %>%
mutate(name = if_else(is.na(value), NA_character_, name)) %>%
group_by(row) %>%
summarize(na_check = sum(!is.na(value)),
block_detail = paste_no_na(value),
block_type = paste_no_na(name))
cbind(df, df_summary %>% select(-row))
基本思路是将数据透视更长的时间,这样我们就可以分析每个组(即原始行),然后进行总结。如果你有很多行,性能可能会变差。检查您是否属于这种情况。
我们可以先用rowSums
得到不是NA
的列数。然后,我们可以使用purrr
将没有NA
的唯一字符折叠成block_detail
。然后,我们可以使用 apply
遍历每一行以获取没有 block_type
.
NA
的列名
library(tidyverse)
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T),
block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))
输出
w x y z na_check block_detail block_type
1 NA a <NA> <NA> 1 a x
2 NA <NA> b b 2 b y|z
3 NA <NA> b c 2 b|c y|z
4 NA <NA> <NA> <NA> 0
5 NA <NA> <NA> b 1 b z
或使用 purrr
而不是 apply
:
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T),
block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>%
mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))
数据
df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA,
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA,
"b")), class = "data.frame", row.names = c(NA, -5L))
这个回答主要是用apply
遍历dataframerow-wise,最后用unite
组合非NA
.
library(tidyverse)
df %>%
mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>%
unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")
w x y z na_check block_type block_detail
1 NA a <NA> <NA> 1 x a
2 NA <NA> b b 2 y|z b
3 NA <NA> b c 2 y|z b|c
4 NA <NA> <NA> <NA> 0
5 NA <NA> <NA> b 1 z b
数据
感谢@AndrewGB dput
+1!
df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA,
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA,
"b")), class = "data.frame", row.names = c(NA, -5L))
这里是mutate
和unite
的组合:
library(dplyr)
library(tidyr)
df %>%
mutate(na_check = rowSums(!is.na(.), na.rm = T)) %>%
unite("block_detail", w:z, na.rm = TRUE, sep = "|", remove = FALSE) %>%
mutate(across(-c(block_detail, na_check), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
unite(block_type, starts_with('new'), na.rm = TRUE, sep = '|') %>%
relocate(block_detail, .after = na_check)
w x y z na_check block_detail block_type
1 <NA> a <NA> <NA> 1 a x
2 <NA> <NA> b b 2 b|b y|z
3 <NA> <NA> b c 2 b|c y|z
4 <NA> <NA> <NA> <NA> 0
5 <NA> <NA> <NA> b 1 b z
数据:
df <- structure(list(w = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), x = c("a", NA, NA, NA, NA), y = c(NA,
"b", "b", NA, NA), z = c(NA, "b", "c", NA, "b")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))