如何根据 R 中的列内容将列名拉入新列

How to pull column name into a new column based on column contents in R

我需要做三个动作:

1.计算 table 中的行向非 NA 值并对它们求和(在单列“check_na”)

[我把我的解决方案放在下面,如果有人能弄清楚如何用地图做到这一点,我很感兴趣。我已经检查了 以获得答案]

2。对于那些不是 NA 的值,创建一个列,将这些 unique 值连接到新列“block detail”中。

[我不知道该怎么做]

3。如果“check_na”有一个值,则拉入列名称并将它们连接到一个新列(“块类型”)

[我不知道该怎么做]

这就是最终产品的样子。 请注意,在第 2 行中,即使“b”出现两次,它只在“块详细信息”中出现一次,但包含它的列单独列出“y|z”

      w x     y     z     na_check block_detail block_type
  <dbl> <chr> <chr> <chr>    <int> <chr>        <chr>     
1    NA a     NA    NA           1 a            x         
2    NA NA    b     b            2 b            y|z       
3    NA NA    b     c            2 b|c          y|z       
4    NA NA    NA    NA           0 NA           NA        
5    NA NA    NA    b            1 b            z 

下面是示例数据和我对第 1 部分的解决方案:


#sample data
df <- tibble(w=rep(NA_real_,5),
       x=c(1,rep(NA_real_,4)),
       y=c(NA_real_,1,rep(NA_real_,3)),
       z=c(NA_real_,1,rep(NA_real_,2),1)
       )

#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with

df_na_check <- df %>% 
  mutate(across(everything(),
                list(na_check=~!is.na(.)),
                .names="{.col}_{.fn}")) %>% 
  rowwise() %>% 
mutate(na_check=sum(c_across(contains("na_check")))) %>% 
  select(w:z,na_check)

感谢您的帮助。理想情况下,如果解决方案可以使用 tidyverse 但对其他方法开放(data.table 或 base r)

下面的怎么样:

library(tidyverse)

df <- tibble(w=rep(NA_real_,5),
             x=c("a",rep(NA_character_,4)),
             y=c(NA_character_, "b", "b", NA_character_, NA_character_),
             z=c(NA_character_, "b", "c", NA_character_, "b"))

paste_no_na <- function(x){
    x <- unique(x)
    paste0(x[!is.na(x)], collapse = "|")
}

df_summary <- df %>% 
    mutate(row = row_number(),
           w = as.character(w)) %>% 
    pivot_longer(cols = -row) %>% 
    mutate(name = if_else(is.na(value), NA_character_, name)) %>% 
    group_by(row) %>% 
    summarize(na_check = sum(!is.na(value)),
              block_detail = paste_no_na(value),
              block_type = paste_no_na(name))

cbind(df, df_summary %>% select(-row))

基本思路是将数据透视更长的时间,这样我们就可以分析每个组(即原始行),然后进行总结。如果你有很多行,性能可能会变差。检查您是否属于这种情况。

我们可以先用rowSums得到不是NA的列数。然后,我们可以使用purrr将没有NA的唯一字符折叠成block_detail。然后,我们可以使用 apply 遍历每一行以获取没有 block_type.

NA 的列名
library(tidyverse)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
         block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))

输出

   w    x    y    z na_check block_detail block_type
1 NA    a <NA> <NA>        1            a          x
2 NA <NA>    b    b        2            b        y|z
3 NA <NA>    b    c        2          b|c        y|z
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1            b          z

或使用 purrr 而不是 apply:

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>% 
  mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))

数据

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

这个回答主要是用apply遍历dataframerow-wise,最后用unite组合非NA.

的colnames
library(tidyverse)

df %>% 
  mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
         across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
         block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>% 
  unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")

   w    x    y    z na_check block_type block_detail
1 NA    a <NA> <NA>        1          x            a
2 NA <NA>    b    b        2        y|z            b
3 NA <NA>    b    c        2        y|z          b|c
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1          z            b

数据

感谢@AndrewGB dput +1!

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

这里是mutateunite的组合:

library(dplyr)
library(tidyr)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T)) %>% 
  unite("block_detail", w:z, na.rm = TRUE, sep = "|", remove = FALSE) %>% 
  mutate(across(-c(block_detail, na_check), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(block_type, starts_with('new'), na.rm = TRUE, sep = '|') %>% 
  relocate(block_detail, .after = na_check)
     w    x    y    z na_check block_detail block_type
1 <NA>    a <NA> <NA>        1            a          x
2 <NA> <NA>    b    b        2          b|b        y|z
3 <NA> <NA>    b    c        2          b|c        y|z
4 <NA> <NA> <NA> <NA>        0                        
5 <NA> <NA> <NA>    b        1            b          z

数据:

df <- structure(list(w = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), x = c("a", NA, NA, NA, NA), y = c(NA, 
"b", "b", NA, NA), z = c(NA, "b", "c", NA, "b")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))