在 R 中重塑 Dataframe(融化?)

Reshaping Dataframe in R (melt?)

所以,我目前的数据框如下所示:

      country   continent year lifeExp   pop     gdpPercap
       <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
1 Afghanistan      Asia  1952  28.801  8425333  779.4453
2 Afghanistan      Asia  1957  30.332  9240934  820.8530
3 Afghanistan      Asia  1962  31.997 10267083  853.1007
4 Afghanistan      Asia  1967  34.020 11537966  836.1971
5 Afghanistan      Asia  1972  36.088 13079460  739.9811
6 Afghanistan      Asia  1977  38.438 14880372  786.1134

有 140 多个国家/地区。年份以 5 年为间隔。从 1952 年到 2007 年,我想重塑我的数据框,以便我得到。

     Country   gdpPercap(1952)     gdpPercap(1957)   ...   gdpPercap(2007)
      <fctr>      <dbl>
1  Afghanistan   974.5803           ....                      ...
2      Albania  5937.0295           ...                       ...
3      Algeria  6223.3675           ...                       ...
4       Angola  4797.2313
5    Argentina 12779.3796
6    Australia 34435.3674
7      Austria 36126.4927
8      Bahrain 29796.0483
9   Bangladesh  1391.2538
10     Belgium 33692.6051

我的尝试是这样的:

gapminder %>% #my dataframe
  filter(year >= 1952) %>%
  group_by(country) %>%
  summarise(gdpPercap = mean(gdpPercap))

输出:

        country  gdpPercap <- but this takes the mean of gdpPercap from 1952-2007
        <fctr>      <dbl>
1  Afghanistan   802.6746
2      Albania  3255.3666
3      Algeria  4426.0260
4       Angola  3607.1005
5    Argentina  8955.5538
6    Australia 19980.5956
7      Austria 20411.9163
8      Bahrain 18077.6639
9   Bangladesh   817.5588
10     Belgium 19900.7581
# ... with 132 more rows

有什么想法吗? PS:我是 R 的新手。我也在看 melt()。任何帮助将不胜感激!

您也应该在 group_by 中使用年份,并且在总结之后,只需使用 dcastrehape

以您想要的方式重塑数据

这是一个示例解决方案:

library(dplyr)
library(reshape2)
gapminder <- data.frame(cbind(gdpPercap=runif(10000), year =as.integer(seq(from=1952, to=2007, by=5)), country = c("India", "US", "UK")))
gapminder$gdpPercap <- as.numeric(as.character(gapminder$gdpPercap))
gapminder$year <- as.integer(as.character(gapminder$year))
gapminder %>% #my dataframe
  filter(year >= 1952) %>%
  group_by(country, year) %>%
  summarise(gdpPercap = mean(gdpPercap)) %>%
   dcast(country ~ year, value.var="gdpPercap")

我必须生成新数据,因为您的示例不可重现。通过 link How to make a great R reproducible example?。它有助于回答和理解问题,以及更快的答案。

tidyr::spread() 会解决您的问题

library(dplyr); library(tidyr)

gapminder %>% 
  select(country, year, gdpPercap) %>% 
  spread(year, gdpPercap)

内置 reshape 可以做到这一点。

foo.data.frame <- data.frame(
    Country=rep(c("Here", "There"), each=3),
    year=rep(c(1952, 1957, 1962),2),
    gdpPercap=779:784
    # ... other variables
)

reshape(foo.data.frame[, c("Country", "year", "gdpPercap")], 
    timevar="year", idvar="Country", direction="wide", sep=" ")

#   Country gdpPercap 1952 gdpPercap 1957 gdpPercap 1962
# 1    Here            779            780            781
# 4   There            782            783            784