当列名表示信息时如何重塑单行数据框

How to reshape a single row data frame when column names denote information

我有模型预测的结果,包括估计值和每个估计值的 upper/lower CIs -- 所有这些都在一行中。我怎样才能旋转更长的时间(使用 tidyr),以便我在一列中获得每个 var 名称,并在各自的列中获得相应的估计值和较低的 CI 和较高的 CI?

数据

library(tidyverse)

prediction <- structure(list(prob.no_vacation = 0.117514519600163, prob.camping = 0.143492608263017, 
    prob.day_trip = 0.111421926419948, prob.hotel = 0.317703454494376, 
    prob.other = 0.046127755158774, prob.zimmmer = 0.263739736063722, 
    L.prob.no_vacation = 0.0862080033692849, L.prob.camping = 0.108591033069218, 
    L.prob.day_trip = 0.0824426383991041, L.prob.hotel = 0.269819723528852, 
    L.prob.other = 0.0280805399319794, L.prob.zimmmer = 0.21869871196767, 
    U.prob.no_vacation = 0.158221505149101, U.prob.camping = 0.187255261510882, 
    U.prob.day_trip = 0.148934253891266, U.prob.hotel = 0.369781447354612, 
    U.prob.other = 0.0748802031049477, U.prob.zimmmer = 0.314325057616515), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))


> prediction
## # A tibble: 1 x 18
##   prob.no_vacation prob.camping prob.day_trip prob.hotel prob.other prob.zimmmer L.prob.no_vacat~ ## L.prob.camping L.prob.day_trip L.prob.hotel L.prob.other L.prob.zimmmer
##              <dbl>        <dbl>         <dbl>      <dbl>      <dbl>        <dbl>            <dbl>          ## <dbl>           <dbl>        <dbl>        <dbl>          <dbl>
## 1            0.118        0.143         0.111      0.318     0.0461        0.264           0.0862          ## 0.109          0.0824        0.270       0.0281          0.219
## # ... with 6 more variables: U.prob.no_vacation <dbl>, U.prob.camping <dbl>, U.prob.day_trip <dbl>, ## U.prob.hotel <dbl>, U.prob.other <dbl>, U.prob.zimmmer <dbl>

所需的重塑输出

有 6 种不同的假期类型:no_vacationcampingday_triphotelzimmmerother。在原始列名称中,每个假期类型的名称前面都有我希望它去的列的种类。

最终,我希望输出如下所示:

我知道这种类型的重塑问题经常出现,但我真的无法理解如何去做,即使我通读了 pivot_longer 文档。我设法简单地转动了更长的时间 pivot_longer(cols = prob.no_vacation:U.prob.zimmmer) 得到:

##    name                value
##    <chr>               <dbl>
##  1 prob.no_vacation   0.118 
##  2 prob.camping       0.143 
##  3 prob.day_trip      0.111 
##  4 prob.hotel         0.318 
##  5 prob.other         0.0461
##  6 prob.zimmmer       0.264 
##  7 L.prob.no_vacation 0.0862
##  8 L.prob.camping     0.109 
##  9 L.prob.day_trip    0.0824
## 10 L.prob.hotel       0.270 
## 11 L.prob.other       0.0281
## 12 L.prob.zimmmer     0.219 
## 13 U.prob.no_vacation 0.158 
## 14 U.prob.camping     0.187 
## 15 U.prob.day_trip    0.149 
## 16 U.prob.hotel       0.370 
## 17 U.prob.other       0.0749
## 18 U.prob.zimmmer     0.314 

但这不是想要的输出,我卡住了。

尝试使用此选项重塑您的数据。我还格式化了名称,以便于管理到数据透视公式中:

library(tidyverse)
#Format names
names(prediction) <- gsub('L.prob','LowerCI',names(prediction))
names(prediction) <- gsub('U.prob','UpperCI',names(prediction))
#Reshape
prediction %>% pivot_longer(cols = names(prediction)) %>% 
  separate(col = name,into = c('var1','var2'),sep = '\.') %>%
  pivot_wider(names_from = var1,values_from = value)

输出:

# A tibble: 6 x 4
  var2          prob LowerCI UpperCI
  <chr>        <dbl>   <dbl>   <dbl>
1 no_vacation 0.118   0.0862  0.158 
2 camping     0.143   0.109   0.187 
3 day_trip    0.111   0.0824  0.149 
4 hotel       0.318   0.270   0.370 
5 other       0.0461  0.0281  0.0749
6 zimmmer     0.264   0.219   0.314 

使用正确的正则表达式分隔列名,然后使用特殊动词 .value

tidyr::pivot_longer(prediction, cols=everything(), 
                                names_to = c(".value", "vacation_type"), 
                                names_pattern = "(.*)\.(.*$)")
# A tibble: 6 x 4
  vacation_type prob L.prob U.prob
  <chr>        <dbl>  <dbl>  <dbl>
1 no_vacation 0.118  0.0862 0.158 
2 camping     0.143  0.109  0.187 
3 day_trip    0.111  0.0824 0.149 
4 hotel       0.318  0.270  0.370 
5 other       0.0461 0.0281 0.0749
6 zimmmer     0.264  0.219  0.314 

您希望将 unlisted 数据放入 6x3 matrix

res <- as.data.frame(
  matrix(unlist(prediction), 6, 
         dimnames=list(substring(names(prediction)[1:6], 6),
                       c("estimate", paste0(c("lower", "upper"), ".CI")))))
res
#               estimate   lower.CI  upper.CI
# no_vacation 0.11751452 0.08620800 0.1582215
# camping     0.14349261 0.10859103 0.1872553
# day_trip    0.11142193 0.08244264 0.1489343
# hotel       0.31770345 0.26981972 0.3697814
# other       0.04612776 0.02808054 0.0748802
# zimmmer     0.26373974 0.21869871 0.3143251