在设置条件后按行号提取数据

Extracting data by row number after a set condition

我有一个 data.frame 是从一个 excel 文件导入的,该文件使用了不规则的结构以使其具有视觉吸引力,但数据不可用。它位于重复的分组数据块中,“周”一词标记一个新条目。 我正在创建一个代码来提取相关数据。这是一个mwe

df = data.frame(x1 = c("Week", "Day", "Exercise", NA, NA, "Walk","Week", "Day", "Exercise", NA, NA, "Run"),
                x2 = c("1", "1",NA, "Advice", NA,NA,"1", "2",NA, "Advice", NA,NA) )
df
                x1     x2
1      Week      1
2       Day      1
3  Exercise   <NA>
4      <NA> Advice
5      <NA>   <NA>
6      Walk   <NA>
7      Week      1
8       Day      2
9  Exercise   <NA>
10     <NA> Advice
11     <NA>   <NA>
12      Run   <NA>

首先我想创建将应用于相应条目的“周”和“日”变量:

df = df%>%
  mutate(Week = case_when(x1 == "Week" ~ x2 ),
         Day =  case_when(x1 == "Day" ~ x2))%>%
  fill(c(Week, Day), .direction= "downup") # fill missing values (NA) with the preceding present value 

df
         x1     x2 Week Day
1      Week      1    1   1
2       Day      1    1   1
3  Exercise   <NA>    1   1
4      <NA> Advice    1   1
5      <NA>   <NA>    1   1
6      Walk   <NA>    1   1
7      Week      1    1   1
8       Day      2    1   2
9  Exercise   <NA>    1   2
10     <NA> Advice    1   2
11     <NA>   <NA>    1   2
12      Run   <NA>    1   2

然后我想提取已完成的练习,始终在 x1 中的“练习”一词下方 3 行

结果应该是这样的

x1       x2     Week  Day   Exercise
   <fct>    <fct>  <fct> <fct> <fct>   
 1 Week     1      1     1     Walk    
 2 Day      1      1     1     Walk    
 3 Exercise NA     1     1     Walk    
 4 NA       Advice 1     1     Walk    
 5 NA       NA     1     1     Walk    
 6 Walk     NA     1     1     Walk    
 7 Week     1      1     1     Walk    
 8 Day      2      1     2     Run     
 9 Exercise NA     1     2     Run     
10 NA       Advice 1     2     Run     
11 NA       NA     1     2     Run     
12 Run      NA     1     2     Run  

如何在条件后指定行号并从该行的指定列中提取数据?

我喜欢 dplyr 解决方案,搜索后找到函数 nth:

df =df%>%
  group_by(Week, Day)%>%
  mutate(Exercise = nth(x1,(which(str_detect(x1, "Exercise"))+3)))

whichstr_detect 找到“练习”的行进行编号。 +3 继续 3 nth 可用于在 x1

中查找该行号中的数据

这是另一个选项 dplyrNA 替换为 0,然后使用 cumsum:

library(dplyr)

df %>% 
  mutate(across(everything(), ~replace(., is.na(.), 0))) %>% 
  mutate(Day = cumsum(x1=="Week")) %>% 
  group_by(Day) %>%
  mutate(Exercise = last(x1))

输出:

   x1       x2       Day Exercise
   <chr>    <chr>  <int> <chr>   
 1 Week     1          1 Walk    
 2 Day      1          1 Walk    
 3 Exercise 0          1 Walk    
 4 0        Advice     1 Walk    
 5 0        0          1 Walk    
 6 Walk     0          1 Walk    
 7 Week     1          2 Run     
 8 Day      2          2 Run     
 9 Exercise 0          2 Run     
10 0        Advice     2 Run     
11 0        0          2 Run     
12 Run      0          2 Run