使用 R 中的 dplyr 根据行和列值选择行
Selecting rows based on row ánd column values using dplyr in R
当我从我使用的非常旧的软件中获取数据时,它看起来像 follows:S
library(dplyr)
library(magrittr)
data <- structure(list(`1` = c("agatston", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...3 = c("area", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...4 = c("volume", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...5 = c("density", "0", "0", "0", "0", "0",
"0", "0", "0", "0"), ...6 = c("mass", "0", "0", "0", "0", "0",
"0", "0", "0", "0"), `10` = c("agatston", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...8 = c("area", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...9 = c("volume", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...10 = c("density", "0", "0", "0",
"0", "0", "0", "0", "0", "0"), ...11 = c("mass", "0", "0", "0",
"0", "0", "0", "0", "0", "0"), `11` = c("agatston", "0", "0",
"0", "0", "0", "0", "0", "0", "0"), ...13 = c("area", "0", "0",
"0", "0", "0", "0", "0", "0", "0"), ...14 = c("volume", "0",
"0", "0", "0", "0", "0", "0", "0", "0"), ...15 = c("density",
"0", "0", "0", "0", "0", "0", "0", "0", "0"), ...16 = c("mass",
"0", "0", "0", "0", "0", "0", "0", "0", "0")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
对应以下小标题:
# A tibble: 10 x 15
`1` ...3 ...4 ...5 ...6 `10` ...8 ...9 ...10 ...11 `11` ...13 ...14 ...15 ...16
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agatston area volume density mass agatston area volume density mass agatston area volume density mass
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
“1”之后以“...”开头的列。是动脉 1 的分数,“10”后的四列是动脉 11 的分数,“11”后以“...”开头的四列是动脉 11 的评估。每条动脉有 5 个评估(agatson,面积、体积、密度和质量;第二行)。
我需要的列基于 2 个条件:
- 我只需要第二行中标记为 'mass' 的列。
- 那么,我只需要某些动脉的质量列。
例如,我只对动脉 1 和 11 的质量感兴趣,这意味着我需要第五列 ('...6') 和第十五列 ('...16')。 two-step 过程需要扫描第一行以查找出现的“1”或“11”,然后 select 扫描第二行中出现的第一个 'mass'。在当前示例中,首选输出如下所示:
# A tibble: 10 x 2
...11 ...16
<chr> <chr>
1 mass mass
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
我曾尝试使用 pivot_wider
格式化为更长的格式,或使用 dplyr
的 select 和过滤器进行过滤,但都无济于事。如果可能的话,我想使用 dplyr
来实现这一点,但我们将不胜感激!
我不完全确定你最后想要的格式,但我想这就是我处理它的方式。
janitor
包非常好用,里面的更多功能值得一看。
library(janitor)
data %>%
row_to_names(row_number = 1) %>% #make first row the title
clean_names() %>% #make this unique
select(starts_with("mass")) #select columns that start with mass
select 'mass'
列你可以做 -
library(dplyr)
data %>% select(where(~.[1] == 'mass'))
# ...6 ...11 ...16
# <chr> <chr> <chr>
# 1 mass mass mass
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 0 0 0
# 6 0 0 0
# 7 0 0 0
# 8 0 0 0
# 9 0 0 0
#10 0 0 0
我不清楚你想如何进一步 select 列,但是在上面的答案中添加 %>% select(2:3)
会给你列 ...11
和 ...16
。
当我从我使用的非常旧的软件中获取数据时,它看起来像 follows:S
library(dplyr)
library(magrittr)
data <- structure(list(`1` = c("agatston", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...3 = c("area", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...4 = c("volume", "0", "0", "0", "0", "0", "0",
"0", "0", "0"), ...5 = c("density", "0", "0", "0", "0", "0",
"0", "0", "0", "0"), ...6 = c("mass", "0", "0", "0", "0", "0",
"0", "0", "0", "0"), `10` = c("agatston", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...8 = c("area", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...9 = c("volume", "0", "0", "0", "0",
"0", "0", "0", "0", "0"), ...10 = c("density", "0", "0", "0",
"0", "0", "0", "0", "0", "0"), ...11 = c("mass", "0", "0", "0",
"0", "0", "0", "0", "0", "0"), `11` = c("agatston", "0", "0",
"0", "0", "0", "0", "0", "0", "0"), ...13 = c("area", "0", "0",
"0", "0", "0", "0", "0", "0", "0"), ...14 = c("volume", "0",
"0", "0", "0", "0", "0", "0", "0", "0"), ...15 = c("density",
"0", "0", "0", "0", "0", "0", "0", "0", "0"), ...16 = c("mass",
"0", "0", "0", "0", "0", "0", "0", "0", "0")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
对应以下小标题:
# A tibble: 10 x 15
`1` ...3 ...4 ...5 ...6 `10` ...8 ...9 ...10 ...11 `11` ...13 ...14 ...15 ...16
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agatston area volume density mass agatston area volume density mass agatston area volume density mass
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
“1”之后以“...”开头的列。是动脉 1 的分数,“10”后的四列是动脉 11 的分数,“11”后以“...”开头的四列是动脉 11 的评估。每条动脉有 5 个评估(agatson,面积、体积、密度和质量;第二行)。
我需要的列基于 2 个条件:
- 我只需要第二行中标记为 'mass' 的列。
- 那么,我只需要某些动脉的质量列。
例如,我只对动脉 1 和 11 的质量感兴趣,这意味着我需要第五列 ('...6') 和第十五列 ('...16')。 two-step 过程需要扫描第一行以查找出现的“1”或“11”,然后 select 扫描第二行中出现的第一个 'mass'。在当前示例中,首选输出如下所示:
# A tibble: 10 x 2
...11 ...16
<chr> <chr>
1 mass mass
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
我曾尝试使用 pivot_wider
格式化为更长的格式,或使用 dplyr
的 select 和过滤器进行过滤,但都无济于事。如果可能的话,我想使用 dplyr
来实现这一点,但我们将不胜感激!
我不完全确定你最后想要的格式,但我想这就是我处理它的方式。
janitor
包非常好用,里面的更多功能值得一看。
library(janitor)
data %>%
row_to_names(row_number = 1) %>% #make first row the title
clean_names() %>% #make this unique
select(starts_with("mass")) #select columns that start with mass
select 'mass'
列你可以做 -
library(dplyr)
data %>% select(where(~.[1] == 'mass'))
# ...6 ...11 ...16
# <chr> <chr> <chr>
# 1 mass mass mass
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 0 0 0
# 6 0 0 0
# 7 0 0 0
# 8 0 0 0
# 9 0 0 0
#10 0 0 0
我不清楚你想如何进一步 select 列,但是在上面的答案中添加 %>% select(2:3)
会给你列 ...11
和 ...16
。