如何在 R 中合并两个数据集?

How to merge two datasets in R?

我目前正在努力合并 R 中的两个数据集。第一个是数百年来各国民主得分和不平等程度的跨国纵向数据集(15,034 次观察,dat_as)。第二个是关于特定国家在特定年份是否有立法机构的跨国纵向数据集(27,192 个观测值,dat_vdem)。我想将立法机构数据附加到不平等数据中。目标是最终 df 具有相同数量的观察值 (15,034)。如果匹配,则合并数据。如果没有匹配项,只需为该行插入一个 NA。我在 R 中尝试过的每一种方法都不起作用。例如,使用此代码我得到一个 df 和 2,558,975 个观察值。

# load data
dat_as <- read.csv("as.csv")
dat_vdem <- read.csv("vdem.csv")

# merge 
test_df <- merge(dat_as, dat_vdem, by = c("code"))

但是,使用此代码,我得到了一个包含 13,355 个观测值的 df

test_df <- merge(dat_as, dat_vdem, by = c("country", "year"))

我做错了什么?任何帮助,将不胜感激。以下是可重现的数据。

这里是 dat_as:

structure(list(X = 1:6, country = c("United States", "United States", 
"United States", "United States", "United States", "United States"
), year = 1800:1805, scode = c("USA", "USA", "USA", "USA", "USA", 
"USA"), code = c("USA", "USA", "USA", "USA", "USA", "USA"), democracy = c(1L, 
1L, 1L, 1L, 1L, 1L), lagdemocracy = c(NA, 1L, 1L, 1L, 1L, 1L), 
    lbmginiint = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), lbmgdppint = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), ldemlbmginiint = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), ldemlbmgdppint = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), yearsq = c(3240000, 
    3243601, 3247204, 3250809, 3254416, 3258025), legislature = c(NA, 
    NA, NA, NA, NA, NA)), row.names = c(NA, 6L), class = "data.frame")

这里是 dat_vdem:

structure(list(X = 1:6, year = 1800:1805, country = c("United States", "United States", "United States", "United States", "United States", "United States"), code = c("USA", 
"USA", "USA", "USA", "USA", "USA"), v2lgbicam = c(0L, 0L, 0L, 
0L, 0L, 0L), v2lgqstexp = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), v2lgotovst = c(-2.1, -2.1, -2.1, -2.1, -2.1, 
-2.1), v2lginvstp = c(-2.05, -2.05, -2.05, -2.05, -2.05, -2.05
), legislature = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 
6L), class = "data.frame")

您描述的是左连接。我发现更简单的方法是使用 dplyr.

dplyr::left_join(dat_as, dat_vdem).

默认情况下,它会尝试猜测要匹配的关键变量。使用您提供的示例数据,它与“X”、“国家”、“年份”、“代码”、“立法机关”相匹配。但如果需要,您可以指定它们。