在 tidyverse 中每行整理多个观察结果

Question

我的数据每行有多个观察值，我想整理一下。

数据是一组卫星传感器的相对光谱响应 (RSR)。我在这里制作了一个代表 actual data 的玩具数据集。每个 sensor/band 都有两列，一列是测试的波长范围（“Wvln(nm)”），一列是该传感器的响应（“RSR”）。在我的玩具数据中，波段 1 对 500 nm 有强烈响应，波段 2 对 600 nm 有强烈响应。

library(tidyverse)

rsr_toy <- tibble::tribble(
  ~`Band 1`, ~`...2`, ~`Band 2`, ~`...4`,
  "Wvln(nm)", "RSR", "Wvln(nm)", "RSR",
  "500", "0.9", "500", "0.01",
  "600", "0.12", "600", "0.8"
)

# remove the first row containing metadata 
rsr1 <- rsr_toy %>% 
  slice(-1) %>% 
  janitor::clean_names("small_camel") %>%
rsr1
# # A tibble: 2 x 4
#   band1 x2    band2 x4   
#   <chr> <chr> <chr> <chr>
# 1 500   0.9   500   0.01 
# 2 600   0.12  600   0.8

我想整理数据，所以每个观察都有自己的行，如下所示：

# desired outcome:
tibble::tribble(
  ~sensor, ~wavelength, ~rsr,
  "band1", 500, 0.9,
  "band1", 600, 0.12,
  "band2", 500, 0.01,
  "band2", 600, 0.8
)
# # A tibble: 4 x 3
#   sensor wavelength   rsr
#   <chr>       <dbl> <dbl>
# 1 band1         500  0.9 
# 2 band1         600  0.12
# 3 band2         500  0.01
# 4 band2         600  0.8

如何使用 tidyr 简单地完成此操作？

Answer 1

我将 tbl 分成每个列表一个传感器，然后在每个传感器上执行一些基本的 dplyr 命令，然后重新加入列表。

按频段拆分，因此每个频段都有自己的 table（遵循 this 解决方案）。

splits <- seq(2,ncol(rsr1),2) %>% 
  map(~ select(rsr1, (.-1):all_of(.)))
splits
# [[1]]
# # A tibble: 2 x 2
#   band1 x2   
#   <chr> <chr>
# 1 500   0.9  
# 2 600   0.12 
# 
# [[2]]
# # A tibble: 2 x 2
#   band2 x4   
#   <chr> <chr>
# 1 500   0.01 
# 2 600   0.8

然后对列表中的每个小标题应用自定义函数：

my_rename_tbl <- function(tbl){
  tbl %>% 
    # add a column with the band name
    add_column(sensor = colnames(tbl)[1], .before = 1) %>% 
    # rename the other two columns "wavelength" and "rsr" respectively
    rename("wvln" = 2, "rsr" = 3)
}

splits %>% 
  map(my_rename_tbl) %>% 
  bind_rows()
# # A tibble: 4 x 3
#   sensor wvln  rsr  
#   <chr>  <chr> <chr>
# 1 band1  500   0.9  
# 2 band1  600   0.12 
# 3 band2  500   0.01 
# 4 band2  600   0.8

Answer 2

获取长格式数据，重命名列并创建 sensor 列。

library(tidyverse)

rsr1 %>%
  pivot_longer(cols = everything(), 
               names_to = '.value', 
               names_pattern = '(.*?)\d') %>%
  rename(wavelength = band, rsr = x) %>%
  mutate(sensor = rep(str_subset(names(rsr1), 'band'), length.out = n()),
         .before = 1)

#  sensor wavelength rsr  
#  <chr>  <chr>      <chr>
#1 band1  500        0.9  
#2 band2  500        0.01 
#3 band1  600        0.12 
#4 band2  600        0.8

在 tidyverse 中每行整理多个观察结果

Tidy multiple observations per row in tidyverse

r

dplyr

tidyr