合并两个不同长度的数据集

Merging of two datasets of different lenghts

我正在尝试合并我拥有的两个数据集。

df1:

day month year lon lat month-year
3 5 2009 5.7 53.9 May 2009
8 9 2004 6.9 52.6 Sep 2004
15 9 2004 3.8 50.4 Sep 2004
5 5 2009 2.7 51.2 May 2009
28 7 2005 14.8 62.4 Jul 2005
18 9 2004 5.1 52.5 Sep 2004

df2:

nao-value sign month-year
- 2.1 Negative Sep 2004
1.3 Positive Jul 2005
- 1.1 Negative May 2009

我想合并它以在发生数据中添加每个月和每年的 NAO 值,这意味着我希望在发生数据中为该月的所有注册重复每个特定月份的 NAO 值。

问题是我无法让 NAO 值按照出现数据排列在它应该排列的位置,它要么只是重复放置,要么与它应该排列的日期不对齐,给出为 month-year.x 和 month- year.y ,或者返回为 NA 值。

我尝试了几种不同的方法:

df3 <- merge(df1, df2, by="month-year")

df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))

df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)

df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))

但不是我想要的结果。

编辑以包含 dput

dput(head(df1, 10)) :

structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22), 
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015, 
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014, 
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389, 
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228, 
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527, 
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010", 
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004", 
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class = 
c("tbl_df", 
"tbl", "data.frame"))

dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53, 
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive", 
"Positive", "Negative", "Negative", "Positive", "Positive", 
"Positive", 
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990", 
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt 
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

merge 函数区分大小写。您正在合并的两个数据框中有不同的情况。使两个数据帧中的情况相同,然后执行 merge。尝试-

result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')

使用tidyverse

library(dplyr)
df1 %>% 
    mutate(Date = tolower(Date)) %>%
    inner_join(df2, by = 'Date')