整理数据：使用模式将多行收集到列中

Question

我的数据框不整齐：

id                            16 
pol_pup1.irf_pol1_pub1          0.0186380741
pol_pup1.lower_pol1_pub1        0.0092071786
pol_pup1.upper_pol1_pub1        0.0289460145
pol_pup10.irf_pol10_pub10       0.0061496499
pol_pup10.lower_pol10_pub10     0.0030948510
pol_pup10.upper_pol10_pub10     0.0080107893
pol_pup105.irf_pol105_pub105    0.0377057491
pol_pup105.lower_pol105_pub105  0.0157756274
pol_pup105.upper_pol105_pub105  0.0610782151
pol_pup111.irf_pol111_pub111    0.0169799646
pol_pup111.lower_pol111_pub111  0.0111885580
pol_pup111.upper_pol111_pub111  0.0217701354
pol_pup112.irf_pol112_pub112    0.0156278416
pol_pup112.lower_pol112_pub112  -0.0043273923
pol_pup112.upper_pol112_pub112  0.0342078865
pol_pup113.irf_pol113_pub113    0.0280868673
pol_pup113.lower_pol113_pub113  0.0203300863
pol_pup113.upper_pol113_pub113  0.0366594965
pol_pup114.irf_pol114_pub114    0.0086282368

and so on with different numbers

如何制作一个数据框，其中 'IRF'、'lower' 和 'upper' 有一个单独的列，并且 'id' 列中的每个数字是单个观察，如下所示：

Observation IRF      Lower   Upper 
1           0.018    0.009   0.028 
10          0.006    0.003   0.008
105         0.037    0.015   0.061
111         0.016    0.011   0.021

Answer 1

我不确定您的数据框的一致性如何，但对此可能会有一些变化。我假设您将数字列命名为“16”

df %>% 
  mutate(
    obs = str_extract(id, '[0-9]+'),
    group = str_extract(id, 'irf|lower|upper')
  ) %>% 
  select(-id) %>% 
  pivot_wider(
    names_from = group,
    values_from = `16`
  )

Answer 2

这是 separate 来自 tidyr 的方法：

一旦第一列被分成其他列，我们就可以使用正则表达式和 str_extract 从 stringr 中提取值。 "[a-z]+$" 模式匹配任何小写字母一次或多次，后跟字符串结尾。

然后我们可以使用 tidyr 中的 pivot_wider。

library(tidyr)
library(dplyr)
library(stringr)
data %>% 
  separate(id,sep = "_", into = c("Pol","Value","Observation","Pub")) %>%
  mutate(Value = str_extract(Value,"[a-z]+$"),
         Observation = str_extract(Observation,"[0-9]+$")) %>%
  dplyr::select(-Pol,-Pub) %>%
  pivot_wider(names_from = Value, values_from = last_col())
# A tibble: 7 x 4
  Observation     irf    lower    upper
  <chr>         <dbl>    <dbl>    <dbl>
1 1           0.0186   0.00921  0.0289 
2 10          0.00615  0.00309  0.00801
3 105         0.0377   0.0158   0.0611 
4 111         0.0170   0.0112   0.0218 
5 112         0.0156  -0.00433  0.0342 
6 113         0.0281   0.0203   0.0367 
7 114         0.00863 NA       NA

整理数据：使用模式将多行收集到列中

Tidy data: gather multiple rows into columns using a pattern

r

tidy

dplyr

tidyr