合并两个数据框但列值不同
Merging two dataframes but different column values
抱歉,我是 R 的新手,非常感谢对此的帮助。我正在尝试根据时间合并以下两个数据框(labourproductivity 和 Depressiondframe):
Time LabourProductivity
1 2004 Q1 96.6
2 Q2 96.9
3 Q3 96.9
4 Q4 97.1
5 2005 Q1 97.6
6 Q2 99.0
和
Time DepressionCount
1 2004 875
2 2004.25 820
3 2004.5 785
4 2004.75 857
5 2005 844
6 2005.25 841
因为它们都有不同的时间值,所以我不知道如何合并它们。理想情况下它看起来像:
Time DepressionCount LabourProductivity
1 2004 875 96.6
2 2004 820 96.9
3 2004 785 96.9
4 2004 857 97.1
5 2005 844 97.6
6 2005 841 99.0
如果"df1"和"df2"是第一个和第二个数据集,则根据[=27=的"Time"列创建分组索引("indx") ].使用 ave
和 as.yearqtr
将 "Time" 列转换为与 "df2" 类似的格式
library(zoo)
indx <- cumsum(grepl('^\d+', df1$Time))
df1$Time <- with(df1, as.numeric(ave(Time, indx, FUN= function(x) {
x[-1] <- paste (sub(' .*', '', x[1]), x[-1])
as.yearqtr(x) })))
merge
数据集,transform
"Time" 列(如果需要)
transform(merge(df1, df2), Time=trunc(Time))
# Time LabourProductivity DepressionCount
#1 2004 96.6 875
#2 2004 96.9 820
#3 2004 96.9 785
#4 2004 97.1 857
#5 2005 97.6 844
#6 2005 99.0 841
或使用data.table
library(data.table)
setDT(df1)[, TimeN:= as.numeric(as.yearqtr(c(Time[1L],
paste(sub(' .*', '', Time[1L]), Time[-1L])))),
list(Grp=cumsum(grepl('^\d+', Time)))][,
Time:= TimeN][, TimeN:=NULL][]
setkey(df1, Time)[df2][, Time:=trunc(Time)][]
# Time LabourProductivity DepressionCount
#1: 2004 96.6 875
#2: 2004 96.9 820
#3: 2004 96.9 785
#4: 2004 97.1 857
#5: 2005 97.6 844
#6: 2005 99.0 841
数据
df1 <- structure(list(Time = c("2004 Q1", "Q2", "Q3", "Q4", "2005 Q1",
"Q2"), LabourProductivity = c(96.6, 96.9, 96.9, 97.1, 97.6, 99
)), .Names = c("Time", "LabourProductivity"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
df2 <- structure(list(Time = c(2004, 2004.25, 2004.5, 2004.75, 2005,
2005.25), DepressionCount = c(875L, 820L, 785L, 857L, 844L, 841L
)), .Names = c("Time", "DepressionCount"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
抱歉,我是 R 的新手,非常感谢对此的帮助。我正在尝试根据时间合并以下两个数据框(labourproductivity 和 Depressiondframe):
Time LabourProductivity
1 2004 Q1 96.6
2 Q2 96.9
3 Q3 96.9
4 Q4 97.1
5 2005 Q1 97.6
6 Q2 99.0
和
Time DepressionCount
1 2004 875
2 2004.25 820
3 2004.5 785
4 2004.75 857
5 2005 844
6 2005.25 841
因为它们都有不同的时间值,所以我不知道如何合并它们。理想情况下它看起来像:
Time DepressionCount LabourProductivity
1 2004 875 96.6
2 2004 820 96.9
3 2004 785 96.9
4 2004 857 97.1
5 2005 844 97.6
6 2005 841 99.0
如果"df1"和"df2"是第一个和第二个数据集,则根据[=27=的"Time"列创建分组索引("indx") ].使用 ave
和 as.yearqtr
library(zoo)
indx <- cumsum(grepl('^\d+', df1$Time))
df1$Time <- with(df1, as.numeric(ave(Time, indx, FUN= function(x) {
x[-1] <- paste (sub(' .*', '', x[1]), x[-1])
as.yearqtr(x) })))
merge
数据集,transform
"Time" 列(如果需要)
transform(merge(df1, df2), Time=trunc(Time))
# Time LabourProductivity DepressionCount
#1 2004 96.6 875
#2 2004 96.9 820
#3 2004 96.9 785
#4 2004 97.1 857
#5 2005 97.6 844
#6 2005 99.0 841
或使用data.table
library(data.table)
setDT(df1)[, TimeN:= as.numeric(as.yearqtr(c(Time[1L],
paste(sub(' .*', '', Time[1L]), Time[-1L])))),
list(Grp=cumsum(grepl('^\d+', Time)))][,
Time:= TimeN][, TimeN:=NULL][]
setkey(df1, Time)[df2][, Time:=trunc(Time)][]
# Time LabourProductivity DepressionCount
#1: 2004 96.6 875
#2: 2004 96.9 820
#3: 2004 96.9 785
#4: 2004 97.1 857
#5: 2005 97.6 844
#6: 2005 99.0 841
数据
df1 <- structure(list(Time = c("2004 Q1", "Q2", "Q3", "Q4", "2005 Q1",
"Q2"), LabourProductivity = c(96.6, 96.9, 96.9, 97.1, 97.6, 99
)), .Names = c("Time", "LabourProductivity"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
df2 <- structure(list(Time = c(2004, 2004.25, 2004.5, 2004.75, 2005,
2005.25), DepressionCount = c(875L, 820L, 785L, 857L, 844L, 841L
)), .Names = c("Time", "DepressionCount"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))