使用 R 中的 dplyr 根据行和列值选择行

Selecting rows based on row ánd column values using dplyr in R

当我从我使用的非常旧的软件中获取数据时,它看起来像 follows:S

library(dplyr)
library(magrittr)

data <- structure(list(`1` = c("agatston", "0", "0", "0", "0", "0", "0", 
"0", "0", "0"), ...3 = c("area", "0", "0", "0", "0", "0", "0", 
"0", "0", "0"), ...4 = c("volume", "0", "0", "0", "0", "0", "0", 
"0", "0", "0"), ...5 = c("density", "0", "0", "0", "0", "0", 
"0", "0", "0", "0"), ...6 = c("mass", "0", "0", "0", "0", "0", 
"0", "0", "0", "0"), `10` = c("agatston", "0", "0", "0", "0", 
"0", "0", "0", "0", "0"), ...8 = c("area", "0", "0", "0", "0", 
"0", "0", "0", "0", "0"), ...9 = c("volume", "0", "0", "0", "0", 
"0", "0", "0", "0", "0"), ...10 = c("density", "0", "0", "0", 
"0", "0", "0", "0", "0", "0"), ...11 = c("mass", "0", "0", "0", 
"0", "0", "0", "0", "0", "0"), `11` = c("agatston", "0", "0", 
"0", "0", "0", "0", "0", "0", "0"), ...13 = c("area", "0", "0", 
"0", "0", "0", "0", "0", "0", "0"), ...14 = c("volume", "0", 
"0", "0", "0", "0", "0", "0", "0", "0"), ...15 = c("density", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), ...16 = c("mass", 
"0", "0", "0", "0", "0", "0", "0", "0", "0")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

对应以下小标题:

# A tibble: 10 x 15
   `1`      ...3  ...4   ...5    ...6  `10`     ...8  ...9   ...10   ...11 `11`     ...13 ...14  ...15   ...16
   <chr>    <chr> <chr>  <chr>   <chr> <chr>    <chr> <chr>  <chr>   <chr> <chr>    <chr> <chr>  <chr>   <chr>
 1 agatston area  volume density mass  agatston area  volume density mass  agatston area  volume density mass 
 2 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 3 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 4 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 5 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 6 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 7 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 8 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
 9 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0    
10 0        0     0      0       0     0        0     0      0       0     0        0     0      0       0  

“1”之后以“...”开头的列。是动脉 1 的分数,“10”后的四列是动脉 11 的分数,“11”后以“...”开头的四列是动脉 11 的评估。每条动脉有 5 个评估(agatson,面积、体积、密度和质量;第二行)。

我需要的列基于 2 个条件:

  1. 我只需要第二行中标记为 'mass' 的列。
  2. 那么,我只需要某些动脉的质量列。

例如,我只对动脉 1 和 11 的质量感兴趣,这意味着我需要第五列 ('...6') 和第十五列 ('...16')。 two-step 过程需要扫描第一行以查找出现的“1”或“11”,然后 select 扫描第二行中出现的第一个 'mass'。在当前示例中,首选输出如下所示:

# A tibble: 10 x 2
   ...11 ...16
   <chr> <chr>
 1 mass  mass 
 2 0     0    
 3 0     0    
 4 0     0    
 5 0     0    
 6 0     0    
 7 0     0    
 8 0     0    
 9 0     0    
10 0     0 

我曾尝试使用 pivot_wider 格式化为更长的格式,或使用 dplyr 的 select 和过滤器进行过滤,但都无济于事。如果可能的话,我想使用 dplyr 来实现这一点,但我们将不胜感激!

我不完全确定你最后想要的格式,但我想这就是我处理它的方式。

janitor 包非常好用,里面的更多功能值得一看。

library(janitor)

data %>%
  row_to_names(row_number = 1) %>% #make first row the title
  clean_names() %>% #make this unique
  select(starts_with("mass")) #select columns that start with mass

select 'mass' 列你可以做 -

library(dplyr)

data %>% select(where(~.[1] == 'mass'))

#  ...6  ...11 ...16
#   <chr> <chr> <chr>
# 1 mass  mass  mass 
# 2 0     0     0    
# 3 0     0     0    
# 4 0     0     0    
# 5 0     0     0    
# 6 0     0     0    
# 7 0     0     0    
# 8 0     0     0    
# 9 0     0     0    
#10 0     0     0    

我不清楚你想如何进一步 select 列,但是在上面的答案中添加 %>% select(2:3) 会给你列 ...11...16