在设置条件后按行号提取数据
Extracting data by row number after a set condition
我有一个 data.frame 是从一个 excel 文件导入的,该文件使用了不规则的结构以使其具有视觉吸引力,但数据不可用。它位于重复的分组数据块中,“周”一词标记一个新条目。
我正在创建一个代码来提取相关数据。这是一个mwe
df = data.frame(x1 = c("Week", "Day", "Exercise", NA, NA, "Walk","Week", "Day", "Exercise", NA, NA, "Run"),
x2 = c("1", "1",NA, "Advice", NA,NA,"1", "2",NA, "Advice", NA,NA) )
df
x1 x2
1 Week 1
2 Day 1
3 Exercise <NA>
4 <NA> Advice
5 <NA> <NA>
6 Walk <NA>
7 Week 1
8 Day 2
9 Exercise <NA>
10 <NA> Advice
11 <NA> <NA>
12 Run <NA>
首先我想创建将应用于相应条目的“周”和“日”变量:
df = df%>%
mutate(Week = case_when(x1 == "Week" ~ x2 ),
Day = case_when(x1 == "Day" ~ x2))%>%
fill(c(Week, Day), .direction= "downup") # fill missing values (NA) with the preceding present value
df
x1 x2 Week Day
1 Week 1 1 1
2 Day 1 1 1
3 Exercise <NA> 1 1
4 <NA> Advice 1 1
5 <NA> <NA> 1 1
6 Walk <NA> 1 1
7 Week 1 1 1
8 Day 2 1 2
9 Exercise <NA> 1 2
10 <NA> Advice 1 2
11 <NA> <NA> 1 2
12 Run <NA> 1 2
然后我想提取已完成的练习,始终在 x1
中的“练习”一词下方 3 行。
结果应该是这样的
x1 x2 Week Day Exercise
<fct> <fct> <fct> <fct> <fct>
1 Week 1 1 1 Walk
2 Day 1 1 1 Walk
3 Exercise NA 1 1 Walk
4 NA Advice 1 1 Walk
5 NA NA 1 1 Walk
6 Walk NA 1 1 Walk
7 Week 1 1 1 Walk
8 Day 2 1 2 Run
9 Exercise NA 1 2 Run
10 NA Advice 1 2 Run
11 NA NA 1 2 Run
12 Run NA 1 2 Run
如何在条件后指定行号并从该行的指定列中提取数据?
我喜欢 dplyr
解决方案,搜索后找到函数 nth
:
df =df%>%
group_by(Week, Day)%>%
mutate(Exercise = nth(x1,(which(str_detect(x1, "Exercise"))+3)))
which
对 str_detect
找到“练习”的行进行编号。 +3 继续 3
nth
可用于在 x1
中查找该行号中的数据
这是另一个选项 dplyr
将 NA
替换为 0
,然后使用 cumsum
:
library(dplyr)
df %>%
mutate(across(everything(), ~replace(., is.na(.), 0))) %>%
mutate(Day = cumsum(x1=="Week")) %>%
group_by(Day) %>%
mutate(Exercise = last(x1))
输出:
x1 x2 Day Exercise
<chr> <chr> <int> <chr>
1 Week 1 1 Walk
2 Day 1 1 Walk
3 Exercise 0 1 Walk
4 0 Advice 1 Walk
5 0 0 1 Walk
6 Walk 0 1 Walk
7 Week 1 2 Run
8 Day 2 2 Run
9 Exercise 0 2 Run
10 0 Advice 2 Run
11 0 0 2 Run
12 Run 0 2 Run
我有一个 data.frame 是从一个 excel 文件导入的,该文件使用了不规则的结构以使其具有视觉吸引力,但数据不可用。它位于重复的分组数据块中,“周”一词标记一个新条目。 我正在创建一个代码来提取相关数据。这是一个mwe
df = data.frame(x1 = c("Week", "Day", "Exercise", NA, NA, "Walk","Week", "Day", "Exercise", NA, NA, "Run"),
x2 = c("1", "1",NA, "Advice", NA,NA,"1", "2",NA, "Advice", NA,NA) )
df
x1 x2
1 Week 1
2 Day 1
3 Exercise <NA>
4 <NA> Advice
5 <NA> <NA>
6 Walk <NA>
7 Week 1
8 Day 2
9 Exercise <NA>
10 <NA> Advice
11 <NA> <NA>
12 Run <NA>
首先我想创建将应用于相应条目的“周”和“日”变量:
df = df%>%
mutate(Week = case_when(x1 == "Week" ~ x2 ),
Day = case_when(x1 == "Day" ~ x2))%>%
fill(c(Week, Day), .direction= "downup") # fill missing values (NA) with the preceding present value
df
x1 x2 Week Day
1 Week 1 1 1
2 Day 1 1 1
3 Exercise <NA> 1 1
4 <NA> Advice 1 1
5 <NA> <NA> 1 1
6 Walk <NA> 1 1
7 Week 1 1 1
8 Day 2 1 2
9 Exercise <NA> 1 2
10 <NA> Advice 1 2
11 <NA> <NA> 1 2
12 Run <NA> 1 2
然后我想提取已完成的练习,始终在 x1
中的“练习”一词下方 3 行。
结果应该是这样的
x1 x2 Week Day Exercise
<fct> <fct> <fct> <fct> <fct>
1 Week 1 1 1 Walk
2 Day 1 1 1 Walk
3 Exercise NA 1 1 Walk
4 NA Advice 1 1 Walk
5 NA NA 1 1 Walk
6 Walk NA 1 1 Walk
7 Week 1 1 1 Walk
8 Day 2 1 2 Run
9 Exercise NA 1 2 Run
10 NA Advice 1 2 Run
11 NA NA 1 2 Run
12 Run NA 1 2 Run
如何在条件后指定行号并从该行的指定列中提取数据?
我喜欢 dplyr
解决方案,搜索后找到函数 nth
:
df =df%>%
group_by(Week, Day)%>%
mutate(Exercise = nth(x1,(which(str_detect(x1, "Exercise"))+3)))
which
对 str_detect
找到“练习”的行进行编号。 +3 继续 3
nth
可用于在 x1
这是另一个选项 dplyr
将 NA
替换为 0
,然后使用 cumsum
:
library(dplyr)
df %>%
mutate(across(everything(), ~replace(., is.na(.), 0))) %>%
mutate(Day = cumsum(x1=="Week")) %>%
group_by(Day) %>%
mutate(Exercise = last(x1))
输出:
x1 x2 Day Exercise
<chr> <chr> <int> <chr>
1 Week 1 1 Walk
2 Day 1 1 Walk
3 Exercise 0 1 Walk
4 0 Advice 1 Walk
5 0 0 1 Walk
6 Walk 0 1 Walk
7 Week 1 2 Run
8 Day 2 2 Run
9 Exercise 0 2 Run
10 0 Advice 2 Run
11 0 0 2 Run
12 Run 0 2 Run