合并两个不同长度的数据集
Merging of two datasets of different lenghts
我正在尝试合并我拥有的两个数据集。
df1:
day
month
year
lon
lat
month-year
3
5
2009
5.7
53.9
May 2009
8
9
2004
6.9
52.6
Sep 2004
15
9
2004
3.8
50.4
Sep 2004
5
5
2009
2.7
51.2
May 2009
28
7
2005
14.8
62.4
Jul 2005
18
9
2004
5.1
52.5
Sep 2004
df2:
nao-value
sign
month-year
- 2.1
Negative
Sep 2004
1.3
Positive
Jul 2005
- 1.1
Negative
May 2009
我想合并它以在发生数据中添加每个月和每年的 NAO 值,这意味着我希望在发生数据中为该月的所有注册重复每个特定月份的 NAO 值。
问题是我无法让 NAO 值按照出现数据排列在它应该排列的位置,它要么只是重复放置,要么与它应该排列的日期不对齐,给出为 month-year.x 和 month- year.y ,或者返回为 NA 值。
我尝试了几种不同的方法:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
但不是我想要的结果。
编辑以包含 dput
:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
merge
函数区分大小写。您正在合并的两个数据框中有不同的情况。使两个数据帧中的情况相同,然后执行 merge
。尝试-
result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')
使用tidyverse
library(dplyr)
df1 %>%
mutate(Date = tolower(Date)) %>%
inner_join(df2, by = 'Date')
我正在尝试合并我拥有的两个数据集。
df1:
day | month | year | lon | lat | month-year |
---|---|---|---|---|---|
3 | 5 | 2009 | 5.7 | 53.9 | May 2009 |
8 | 9 | 2004 | 6.9 | 52.6 | Sep 2004 |
15 | 9 | 2004 | 3.8 | 50.4 | Sep 2004 |
5 | 5 | 2009 | 2.7 | 51.2 | May 2009 |
28 | 7 | 2005 | 14.8 | 62.4 | Jul 2005 |
18 | 9 | 2004 | 5.1 | 52.5 | Sep 2004 |
df2:
nao-value | sign | month-year |
---|---|---|
- 2.1 | Negative | Sep 2004 |
1.3 | Positive | Jul 2005 |
- 1.1 | Negative | May 2009 |
我想合并它以在发生数据中添加每个月和每年的 NAO 值,这意味着我希望在发生数据中为该月的所有注册重复每个特定月份的 NAO 值。
问题是我无法让 NAO 值按照出现数据排列在它应该排列的位置,它要么只是重复放置,要么与它应该排列的日期不对齐,给出为 month-year.x 和 month- year.y ,或者返回为 NA 值。
我尝试了几种不同的方法:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
但不是我想要的结果。
编辑以包含 dput
:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
merge
函数区分大小写。您正在合并的两个数据框中有不同的情况。使两个数据帧中的情况相同,然后执行 merge
。尝试-
result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')
使用tidyverse
library(dplyr)
df1 %>%
mutate(Date = tolower(Date)) %>%
inner_join(df2, by = 'Date')