面板数据从长到宽的变化,但仅适用于行中的某些值

Long to wide change in panel data, but only for certain values in rows

我已经在网上广泛浏览了,但到目前为止还没有找到适合我在这个特定情况下的问题的答案。

我正在寻求 部分 重组面板数据集,从长格式到宽格式,但仅限于由它们各自指定的特定值 names/characters在 R 中的行中。

考虑这个原始格式:

          SERIES       ECONOMY      YEAR     Value
246        CPI         Panama       1960     0.05
247        CPI         Peru         1960     0.05
248        CPI         XXXXXX       1960     0.05
249        CPI         Panama       1961     0.06
250        CPI         Peru         1961     0.06
251        CPI         XXXXXX       1961     0.06
252   % Gross savings  Panama       1960     5
253   % Gross savings  Peru         1960     6
254   % Gross savings  XXXXXX       1960     7
255   % Gross savings  Panama       1961     20
256   % Gross savings  Peru         1961     21
257   % Gross savings  XXXXXX       1961     22

(依此类推,不同的国家,“系列”栏中的不同指标,1960-2020 期间每个国家和指标。)

我希望将“经济”保留为单独的列,指定最初看到的国家/地区,也将年份保留为一列,但将系列下的每个单独指标(例如 CPI / % Gross savings)移至他们自己的专栏是这样的:

          ECONOMY       YEAR      CPI      %_GROSS_SAVINGS
1         Panama        1960      0.05     5
2         Peru          1960      0.05     6
3         XXXXXX        1960      0.05     7
4         Panama        1961      0.06     20
5         Peru          1961      0.06     21
6         XXXXXX        1961      0.06     22

有什么想法吗?感谢您的回答。

不确定我是否遵循 - 在我看来这是典型的 pivot_wider 使用:

library(tidyr)
dat |> pivot_wider(names_from = "SERIES",
                   values_from = "Value")

#> # A tibble: 6 x 4
#>   ECONOMY  YEAR   CPI `% Gross savings`
#>   <chr>   <dbl> <dbl>             <dbl>
#> 1 Panama   1960  0.05                 5
#> 2 Peru     1960  0.05                 6
#> 3 XXXXXX   1960  0.05                 7
#> 4 Panama   1961  0.06                20
#> 5 Peru     1961  0.06                21
#> 6 XXXXXX   1961  0.06                22

reprex package (v2.0.0)

于 2022-04-08 创建

可重现的数据:

dat <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI", 
"CPI", "% Gross savings", "% Gross savings", "% Gross savings", 
"% Gross savings", "% Gross savings", "% Gross savings"), ECONOMY = c("Panama", 
"Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", 
"XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960, 1960, 1960, 
1961, 1961, 1961, 1960, 1960, 1960, 1961, 1961, 1961), Value = c(0.05, 
0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

重塑2

reshape2::dcast(ECONOMY + YEAR ~ SERIES, data = zz)
# Using Value as value column: use value.var to override.
#   ECONOMY YEAR %_Gross_savings  CPI
# 1  Panama 1960               5 0.05
# 2  Panama 1961              20 0.06
# 3    Peru 1960               6 0.05
# 4    Peru 1961              21 0.06
# 5  XXXXXX 1960               7 0.05
# 6  XXXXXX 1961              22 0.06

数据

zz <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI", "CPI", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings"), ECONOMY = c("Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960L, 1960L, 1960L, 1961L, 1961L, 1961L, 1960L, 1960L, 1960L, 1961L, 1961L, 1961L), Value = c(0.05, 0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), class = "data.frame", row.names = c("246",  "247", "248", "249", "250", "251", "252", "253", "254", "255", "256", "257"))