从数据框中匹配和添加列
Matching and adding column from data frame
问题
我如何添加由两个因素选择值的列,其中一个是另一个数据框中的行名称?
示例数据
我有以下数据框:
Country, Year, Q1
Portugal, 2004, Yes
Portugal, 2005, Yes
Portugal, 2008, No
UK, 2006, Yes
UK, 2008, No
France, 2004, No
我想从其他数据框中添加一列;
Country, 2004,2005,2006,...
Portugal,0.15,0.25,0.14,0.21,...
Spain,0.16,0.26,0.14,0.16,...
UK,0.17,0.23,0.18,0.17,...
France,0.15,0.26,0.11,0.20,...
期望的结果
并获得如下所示的数据框;
Country, Year, Q1 , Q2
Portugal, 2004, Yes, 0.15
Portugal, 2005, Yes, 0.25
Portugal, 2006, No , 0.14
UK, 2005, Yes , 0.23
UK, 2006, No , 0.18
France, 2004, No, 0.15
非常感谢!
您可以使用 merge() 函数
merge(x, y, by = intersect(names(x), names(y))
例如:
l <- c(1,2,3); d <- c("a","b","c"); a <- data.frame(col1=l, col2=d)
l <- c(1,2,3); d <- c("a2","b2","c2"); b <- data.frame(col1=l, col2=d)
然后
合并(a,b,by="col1")
您将获得:
col1 col2.x col2.y
1 1 a a2
2 2 b b2
3 3 c c2
使用 tidyr 和 dplyr 库:
library(tidyr)
library(dplyr)
# convert df2 to df2_long format
df2_long <-
df2 %>%
gather( Year, Value, -Country, convert = TRUE )
# join df1 with df2
df1 %>%
left_join( df2_long, by = c("Country", "Year"))
结果:
Country Year Q1 Value
1 Portugal 2004 Yes 0.15
2 Portugal 2005 Yes 0.25
3 Portugal 2008 No NA
4 UK 2006 Yes 0.18
5 UK 2008 No NA
6 France 2004 No 0.15
我使用的输入数据:
df1 <-
structure(list(Country = c("Portugal", "Portugal", "Portugal",
"UK", "UK", "France"), Year = c(2004L, 2005L, 2008L, 2006L, 2008L,
2004L), Q1 = c("Yes", "Yes", "No", "Yes", "No", "No")),
.Names = c("Country",
"Year", "Q1"), class = "data.frame", row.names = c(NA, -6L))
df2 <-
structure(list(Country = c("Portugal", "Spain", "UK", "France"
), `2004` = c(0.15, 0.16, 0.17, 0.15), `2005` = c(0.25, 0.26,
0.23, 0.26), `2006` = c(0.14, 0.14, 0.18, 0.11)), .Names = c("Country",
"2004", "2005", "2006"), class = "data.frame", row.names = c(NA,
-4L))
问题
我如何添加由两个因素选择值的列,其中一个是另一个数据框中的行名称?
示例数据
我有以下数据框:
Country, Year, Q1
Portugal, 2004, Yes
Portugal, 2005, Yes
Portugal, 2008, No
UK, 2006, Yes
UK, 2008, No
France, 2004, No
我想从其他数据框中添加一列;
Country, 2004,2005,2006,...
Portugal,0.15,0.25,0.14,0.21,...
Spain,0.16,0.26,0.14,0.16,...
UK,0.17,0.23,0.18,0.17,...
France,0.15,0.26,0.11,0.20,...
期望的结果
并获得如下所示的数据框;
Country, Year, Q1 , Q2
Portugal, 2004, Yes, 0.15
Portugal, 2005, Yes, 0.25
Portugal, 2006, No , 0.14
UK, 2005, Yes , 0.23
UK, 2006, No , 0.18
France, 2004, No, 0.15
非常感谢!
您可以使用 merge() 函数
merge(x, y, by = intersect(names(x), names(y))
例如:
l <- c(1,2,3); d <- c("a","b","c"); a <- data.frame(col1=l, col2=d)
l <- c(1,2,3); d <- c("a2","b2","c2"); b <- data.frame(col1=l, col2=d)
然后 合并(a,b,by="col1")
您将获得:
col1 col2.x col2.y
1 1 a a2
2 2 b b2
3 3 c c2
使用 tidyr 和 dplyr 库:
library(tidyr)
library(dplyr)
# convert df2 to df2_long format
df2_long <-
df2 %>%
gather( Year, Value, -Country, convert = TRUE )
# join df1 with df2
df1 %>%
left_join( df2_long, by = c("Country", "Year"))
结果:
Country Year Q1 Value
1 Portugal 2004 Yes 0.15
2 Portugal 2005 Yes 0.25
3 Portugal 2008 No NA
4 UK 2006 Yes 0.18
5 UK 2008 No NA
6 France 2004 No 0.15
我使用的输入数据:
df1 <-
structure(list(Country = c("Portugal", "Portugal", "Portugal",
"UK", "UK", "France"), Year = c(2004L, 2005L, 2008L, 2006L, 2008L,
2004L), Q1 = c("Yes", "Yes", "No", "Yes", "No", "No")),
.Names = c("Country",
"Year", "Q1"), class = "data.frame", row.names = c(NA, -6L))
df2 <-
structure(list(Country = c("Portugal", "Spain", "UK", "France"
), `2004` = c(0.15, 0.16, 0.17, 0.15), `2005` = c(0.25, 0.26,
0.23, 0.26), `2006` = c(0.14, 0.14, 0.18, 0.11)), .Names = c("Country",
"2004", "2005", "2006"), class = "data.frame", row.names = c(NA,
-4L))