在 R 中重塑 Dataframe(融化?)
Reshaping Dataframe in R (melt?)
所以,我目前的数据框如下所示:
country continent year lifeExp pop gdpPercap
<fctr> <fctr> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.801 8425333 779.4453
2 Afghanistan Asia 1957 30.332 9240934 820.8530
3 Afghanistan Asia 1962 31.997 10267083 853.1007
4 Afghanistan Asia 1967 34.020 11537966 836.1971
5 Afghanistan Asia 1972 36.088 13079460 739.9811
6 Afghanistan Asia 1977 38.438 14880372 786.1134
有 140 多个国家/地区。年份以 5 年为间隔。从 1952 年到 2007 年,我想重塑我的数据框,以便我得到。
Country gdpPercap(1952) gdpPercap(1957) ... gdpPercap(2007)
<fctr> <dbl>
1 Afghanistan 974.5803 .... ...
2 Albania 5937.0295 ... ...
3 Algeria 6223.3675 ... ...
4 Angola 4797.2313
5 Argentina 12779.3796
6 Australia 34435.3674
7 Austria 36126.4927
8 Bahrain 29796.0483
9 Bangladesh 1391.2538
10 Belgium 33692.6051
我的尝试是这样的:
gapminder %>% #my dataframe
filter(year >= 1952) %>%
group_by(country) %>%
summarise(gdpPercap = mean(gdpPercap))
输出:
country gdpPercap <- but this takes the mean of gdpPercap from 1952-2007
<fctr> <dbl>
1 Afghanistan 802.6746
2 Albania 3255.3666
3 Algeria 4426.0260
4 Angola 3607.1005
5 Argentina 8955.5538
6 Australia 19980.5956
7 Austria 20411.9163
8 Bahrain 18077.6639
9 Bangladesh 817.5588
10 Belgium 19900.7581
# ... with 132 more rows
有什么想法吗? PS:我是 R 的新手。我也在看 melt()。任何帮助将不胜感激!
您也应该在 group_by 中使用年份,并且在总结之后,只需使用 dcast
或 rehape
以您想要的方式重塑数据
这是一个示例解决方案:
library(dplyr)
library(reshape2)
gapminder <- data.frame(cbind(gdpPercap=runif(10000), year =as.integer(seq(from=1952, to=2007, by=5)), country = c("India", "US", "UK")))
gapminder$gdpPercap <- as.numeric(as.character(gapminder$gdpPercap))
gapminder$year <- as.integer(as.character(gapminder$year))
gapminder %>% #my dataframe
filter(year >= 1952) %>%
group_by(country, year) %>%
summarise(gdpPercap = mean(gdpPercap)) %>%
dcast(country ~ year, value.var="gdpPercap")
我必须生成新数据,因为您的示例不可重现。通过 link How to make a great R reproducible example?。它有助于回答和理解问题,以及更快的答案。
tidyr::spread()
会解决您的问题
library(dplyr); library(tidyr)
gapminder %>%
select(country, year, gdpPercap) %>%
spread(year, gdpPercap)
内置 reshape
可以做到这一点。
foo.data.frame <- data.frame(
Country=rep(c("Here", "There"), each=3),
year=rep(c(1952, 1957, 1962),2),
gdpPercap=779:784
# ... other variables
)
reshape(foo.data.frame[, c("Country", "year", "gdpPercap")],
timevar="year", idvar="Country", direction="wide", sep=" ")
# Country gdpPercap 1952 gdpPercap 1957 gdpPercap 1962
# 1 Here 779 780 781
# 4 There 782 783 784
所以,我目前的数据框如下所示:
country continent year lifeExp pop gdpPercap
<fctr> <fctr> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.801 8425333 779.4453
2 Afghanistan Asia 1957 30.332 9240934 820.8530
3 Afghanistan Asia 1962 31.997 10267083 853.1007
4 Afghanistan Asia 1967 34.020 11537966 836.1971
5 Afghanistan Asia 1972 36.088 13079460 739.9811
6 Afghanistan Asia 1977 38.438 14880372 786.1134
有 140 多个国家/地区。年份以 5 年为间隔。从 1952 年到 2007 年,我想重塑我的数据框,以便我得到。
Country gdpPercap(1952) gdpPercap(1957) ... gdpPercap(2007)
<fctr> <dbl>
1 Afghanistan 974.5803 .... ...
2 Albania 5937.0295 ... ...
3 Algeria 6223.3675 ... ...
4 Angola 4797.2313
5 Argentina 12779.3796
6 Australia 34435.3674
7 Austria 36126.4927
8 Bahrain 29796.0483
9 Bangladesh 1391.2538
10 Belgium 33692.6051
我的尝试是这样的:
gapminder %>% #my dataframe
filter(year >= 1952) %>%
group_by(country) %>%
summarise(gdpPercap = mean(gdpPercap))
输出:
country gdpPercap <- but this takes the mean of gdpPercap from 1952-2007
<fctr> <dbl>
1 Afghanistan 802.6746
2 Albania 3255.3666
3 Algeria 4426.0260
4 Angola 3607.1005
5 Argentina 8955.5538
6 Australia 19980.5956
7 Austria 20411.9163
8 Bahrain 18077.6639
9 Bangladesh 817.5588
10 Belgium 19900.7581
# ... with 132 more rows
有什么想法吗? PS:我是 R 的新手。我也在看 melt()。任何帮助将不胜感激!
您也应该在 group_by 中使用年份,并且在总结之后,只需使用 dcast
或 rehape
这是一个示例解决方案:
library(dplyr)
library(reshape2)
gapminder <- data.frame(cbind(gdpPercap=runif(10000), year =as.integer(seq(from=1952, to=2007, by=5)), country = c("India", "US", "UK")))
gapminder$gdpPercap <- as.numeric(as.character(gapminder$gdpPercap))
gapminder$year <- as.integer(as.character(gapminder$year))
gapminder %>% #my dataframe
filter(year >= 1952) %>%
group_by(country, year) %>%
summarise(gdpPercap = mean(gdpPercap)) %>%
dcast(country ~ year, value.var="gdpPercap")
我必须生成新数据,因为您的示例不可重现。通过 link How to make a great R reproducible example?。它有助于回答和理解问题,以及更快的答案。
tidyr::spread()
会解决您的问题
library(dplyr); library(tidyr)
gapminder %>%
select(country, year, gdpPercap) %>%
spread(year, gdpPercap)
内置 reshape
可以做到这一点。
foo.data.frame <- data.frame(
Country=rep(c("Here", "There"), each=3),
year=rep(c(1952, 1957, 1962),2),
gdpPercap=779:784
# ... other variables
)
reshape(foo.data.frame[, c("Country", "year", "gdpPercap")],
timevar="year", idvar="Country", direction="wide", sep=" ")
# Country gdpPercap 1952 gdpPercap 1957 gdpPercap 1962
# 1 Here 779 780 781
# 4 There 782 783 784