Vlookup-match like R 中的函数

Vlookup-match like function in R

我是 R 的新手,我目前正在将我对 R 的知识应用到我必须执行的分析工作中。

我有两个数据框 - 数据框 A 包含交易明细,而数据框 B 包含各种货币的每月收盘汇率。

数据框A - 交易详情

    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE
1           0001              INR         305000 Mar 2014 2014-03-01
2           0002              USD          15000 Oct 2014 2014-10-31
3           0003              JPY          85000 Feb 2015 2015-02-09
4           0004              CNY        1800000 Mar 2015 2015-03-27

structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"), 
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000, 
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")), 
row.names = c(NA, -4L), class = "data.frame")

数据框 B - 汇率

    MMYYYY       Date    CNY    INR     JPY       USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036

structure(list(MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), 
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY = 
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383), 
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692, 
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY", 
"USD"), class = "data.frame", row.names = c(NA, -4L))

我想做的是在数据框 A 中创建一个可能名为 Exchange Rate 的新列。我想通过查找数据框 B,通过将数据框 A 中的 COLLECTION_CRNCYMMYYYY 匹配到数据框 B 来获取此汇率值。即:

TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
1           0001              INR         305000 Mar 2014 2014-03-01    47.7260000
2           0002              USD          15000 Oct 2014 2014-10-31     0.7778469
3           0003              JPY          85000 Feb 2015 2015-02-09    87.7690000
4           0004              CNY        1800000 Mar 2015 2015-03-27     4.5179000

我可以通过 Excel 使用 vlookup 和 match 轻松做到这一点,但我想知道如何使用 R 实现相同的结果,因为我的交易详细信息文件非常大。

这是一种可能的 data.table 方法。基本上你需要做的是将 df2 转换为长格式,然后只是一个简单的(二进制)左连接到 df1

library(data.table)
temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY")
setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value]
df1
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1:           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2:           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3:           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4:           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

或者,您可以使用 "Hadleyverse" 做类似的事情,但是 dplyr 将无法合并 zoo class 列(目前),所以您需要先取消class它们

library(dplyr)
library(tidyr)
df2[-2] %>% 
  gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>%
  mutate(MMYYYY = as.numeric(MMYYYY)) %>%
  left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), .,
                           by = c("MMYYYY", "COLLECTION_CRNCY"))
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

您可以使用基数 R 通过 applymerge 解决此问题。

分解问题,

  1. 将两个数据集合并在一起
  2. 提取相关列

1

要合并数据,只需使用:

merge(dfa, dfb, by="MMYYYY")

2

要提取相关字段,我们可以使用 apply 函数,以行的方式。

apply(df, 1, function(x) ...)

其中 df 是 data.frame、1 信号行。


把它们放在一起,我们可以像这样在一行中提取汇率:

dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])

x[x[['COLLECTION_CRNCY']]] 所做的只是查找列 COLLECTION_CRNCY,然后使用该值查询适当的货币列。


最终代码:

dfa$exchange.rate <- apply(merge(dfa, dfb, by="MMYYYY"), 1, function(x) x[x[['COLLECTION_CRNCY']]])
dfa$exchange.rate <- as.numeric(dfa$exchange.rate) # since it isn't numeric format.
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
#  1           0001              INR         305000 2014.167 2014-03-01    47.7260000
#  2           0002              USD          15000 2014.750 2014-10-31     0.7778469
#  3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
#  4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

可以使用 reshape(). It may be the most annoying function in R, but you can usually get where you want to go if you play around with its options for long enough. Once you have B in long format, a simple call to merge() 从宽格式转换为长格式以获得所需的输出。

B.id <- c('MMYYYY','Date');
B.time <- setdiff(names(B),B.id);
B.long <- reshape(B,dir='l',idvar=B.id,varying=B.time,times=B.time,timevar='COLLECTION_CRNCY',v.names='exchange.rate',new.row.names=1:(length(B.time)*nrow(B)));
B.long;
##      MMYYYY       Date COLLECTION_CRNCY exchange.rate
## 1  2014.167 2014-03-31              CNY     4.9444000
## 2  2014.750 2014-10-31              CNY     4.7552000
## 3  2015.083 2015-02-27              CNY     4.5990000
## 4  2015.167 2015-03-31              CNY     4.5179000
## 5  2014.167 2014-03-31              INR    47.7260000
## 6  2014.750 2014-10-31              INR    47.7490000
## 7  2015.083 2015-02-27              INR    45.2220000
## 8  2015.167 2015-03-31              INR    45.3830000
## 9  2014.167 2014-03-31              JPY    82.0845000
## 10 2014.750 2014-10-31              JPY    87.2604000
## 11 2015.083 2015-02-27              JPY    87.7690000
## 12 2015.167 2015-03-31              JPY    87.5395000
## 13 2014.167 2014-03-31              USD     0.7951654
## 14 2014.750 2014-10-31              USD     0.7778469
## 15 2015.083 2015-02-27              USD     0.7338372
## 16 2015.167 2015-03-31              USD     0.7287036
merge(A,B.long[c('MMYYYY','COLLECTION_CRNCY','exchange.rate')],all.x=T);
##   COLLECTION_CRNCY   MMYYYY TRANSACTION_ID COLLECTION_AMT  LODG_DATE exchange.rate
## 1              CNY 2015.167           0004        1800000 2015-03-27     4.5179000
## 2              INR 2014.167           0001         305000 2014-03-01    47.7260000
## 3              JPY 2015.083           0003          85000 2015-02-09    87.7690000
## 4              USD 2014.750           0002          15000 2014-10-31     0.7778469

另一种方式供参考:

res <- numeric(nrow(dfA))
for(i in seq_len(nrow(dfA))) {
    res[i] <- dfB[match(dfA$MMYYYY[i], dfB$MMYYY), 
                  match(dfA$COLLECTION_CRNCY[i], names(dfB))]}
dfA$Exchange<- res
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY
# 1           0001              INR         305000 2014.167
# 2           0002              USD          15000 2014.750
# 3           0003              JPY          85000 2015.083
# 4           0004              CNY        1800000 2015.167
#    LODG_DATE   Exchange
# 1 2014-03-01 47.7260000
# 2 2014-10-31  0.7778469
# 3 2015-02-09 87.7690000
# 4 2015-03-27  4.5179000