面板数据从长到宽的变化,但仅适用于行中的某些值
Long to wide change in panel data, but only for certain values in rows
我已经在网上广泛浏览了,但到目前为止还没有找到适合我在这个特定情况下的问题的答案。
我正在寻求 部分 重组面板数据集,从长格式到宽格式,但仅限于由它们各自指定的特定值 names/characters在 R 中的行中。
考虑这个原始格式:
SERIES ECONOMY YEAR Value
246 CPI Panama 1960 0.05
247 CPI Peru 1960 0.05
248 CPI XXXXXX 1960 0.05
249 CPI Panama 1961 0.06
250 CPI Peru 1961 0.06
251 CPI XXXXXX 1961 0.06
252 % Gross savings Panama 1960 5
253 % Gross savings Peru 1960 6
254 % Gross savings XXXXXX 1960 7
255 % Gross savings Panama 1961 20
256 % Gross savings Peru 1961 21
257 % Gross savings XXXXXX 1961 22
(依此类推,不同的国家,“系列”栏中的不同指标,1960-2020 期间每个国家和指标。)
我希望将“经济”保留为单独的列,指定最初看到的国家/地区,也将年份保留为一列,但将系列下的每个单独指标(例如 CPI / % Gross savings)移至他们自己的专栏是这样的:
ECONOMY YEAR CPI %_GROSS_SAVINGS
1 Panama 1960 0.05 5
2 Peru 1960 0.05 6
3 XXXXXX 1960 0.05 7
4 Panama 1961 0.06 20
5 Peru 1961 0.06 21
6 XXXXXX 1961 0.06 22
有什么想法吗?感谢您的回答。
不确定我是否遵循 - 在我看来这是典型的 pivot_wider
使用:
library(tidyr)
dat |> pivot_wider(names_from = "SERIES",
values_from = "Value")
#> # A tibble: 6 x 4
#> ECONOMY YEAR CPI `% Gross savings`
#> <chr> <dbl> <dbl> <dbl>
#> 1 Panama 1960 0.05 5
#> 2 Peru 1960 0.05 6
#> 3 XXXXXX 1960 0.05 7
#> 4 Panama 1961 0.06 20
#> 5 Peru 1961 0.06 21
#> 6 XXXXXX 1961 0.06 22
由 reprex package (v2.0.0)
于 2022-04-08 创建
可重现的数据:
dat <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI",
"CPI", "% Gross savings", "% Gross savings", "% Gross savings",
"% Gross savings", "% Gross savings", "% Gross savings"), ECONOMY = c("Panama",
"Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru",
"XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960, 1960, 1960,
1961, 1961, 1961, 1960, 1960, 1960, 1961, 1961, 1961), Value = c(0.05,
0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
重塑2
reshape2::dcast(ECONOMY + YEAR ~ SERIES, data = zz)
# Using Value as value column: use value.var to override.
# ECONOMY YEAR %_Gross_savings CPI
# 1 Panama 1960 5 0.05
# 2 Panama 1961 20 0.06
# 3 Peru 1960 6 0.05
# 4 Peru 1961 21 0.06
# 5 XXXXXX 1960 7 0.05
# 6 XXXXXX 1961 22 0.06
数据
zz <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI", "CPI", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings"), ECONOMY = c("Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960L, 1960L, 1960L, 1961L, 1961L, 1961L, 1960L, 1960L, 1960L, 1961L, 1961L, 1961L), Value = c(0.05, 0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), class = "data.frame", row.names = c("246", "247", "248", "249", "250", "251", "252", "253", "254", "255", "256", "257"))
我已经在网上广泛浏览了,但到目前为止还没有找到适合我在这个特定情况下的问题的答案。
我正在寻求 部分 重组面板数据集,从长格式到宽格式,但仅限于由它们各自指定的特定值 names/characters在 R 中的行中。
考虑这个原始格式:
SERIES ECONOMY YEAR Value
246 CPI Panama 1960 0.05
247 CPI Peru 1960 0.05
248 CPI XXXXXX 1960 0.05
249 CPI Panama 1961 0.06
250 CPI Peru 1961 0.06
251 CPI XXXXXX 1961 0.06
252 % Gross savings Panama 1960 5
253 % Gross savings Peru 1960 6
254 % Gross savings XXXXXX 1960 7
255 % Gross savings Panama 1961 20
256 % Gross savings Peru 1961 21
257 % Gross savings XXXXXX 1961 22
(依此类推,不同的国家,“系列”栏中的不同指标,1960-2020 期间每个国家和指标。)
我希望将“经济”保留为单独的列,指定最初看到的国家/地区,也将年份保留为一列,但将系列下的每个单独指标(例如 CPI / % Gross savings)移至他们自己的专栏是这样的:
ECONOMY YEAR CPI %_GROSS_SAVINGS
1 Panama 1960 0.05 5
2 Peru 1960 0.05 6
3 XXXXXX 1960 0.05 7
4 Panama 1961 0.06 20
5 Peru 1961 0.06 21
6 XXXXXX 1961 0.06 22
有什么想法吗?感谢您的回答。
不确定我是否遵循 - 在我看来这是典型的 pivot_wider
使用:
library(tidyr)
dat |> pivot_wider(names_from = "SERIES",
values_from = "Value")
#> # A tibble: 6 x 4
#> ECONOMY YEAR CPI `% Gross savings`
#> <chr> <dbl> <dbl> <dbl>
#> 1 Panama 1960 0.05 5
#> 2 Peru 1960 0.05 6
#> 3 XXXXXX 1960 0.05 7
#> 4 Panama 1961 0.06 20
#> 5 Peru 1961 0.06 21
#> 6 XXXXXX 1961 0.06 22
由 reprex package (v2.0.0)
于 2022-04-08 创建可重现的数据:
dat <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI",
"CPI", "% Gross savings", "% Gross savings", "% Gross savings",
"% Gross savings", "% Gross savings", "% Gross savings"), ECONOMY = c("Panama",
"Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru",
"XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960, 1960, 1960,
1961, 1961, 1961, 1960, 1960, 1960, 1961, 1961, 1961), Value = c(0.05,
0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
重塑2
reshape2::dcast(ECONOMY + YEAR ~ SERIES, data = zz)
# Using Value as value column: use value.var to override.
# ECONOMY YEAR %_Gross_savings CPI
# 1 Panama 1960 5 0.05
# 2 Panama 1961 20 0.06
# 3 Peru 1960 6 0.05
# 4 Peru 1961 21 0.06
# 5 XXXXXX 1960 7 0.05
# 6 XXXXXX 1961 22 0.06
数据
zz <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI", "CPI", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings"), ECONOMY = c("Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960L, 1960L, 1960L, 1961L, 1961L, 1961L, 1960L, 1960L, 1960L, 1961L, 1961L, 1961L), Value = c(0.05, 0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), class = "data.frame", row.names = c("246", "247", "248", "249", "250", "251", "252", "253", "254", "255", "256", "257"))