根据匹配的日期检索值
Retrieving value based on matched dates
我有两个数据框。第一个包含具有相应开始和结束时间的事件。第二个包含不同 ID 每分钟的价格。往下看:
Event starttime endtime
Change in Nonfarm Payrolls 2020-03-06 08:15:00 2020-03-06 09:00:00
Change in Nonfarm Payrolls 2020-02-07 08:15:00 2020-02-07 09:00:00
Change in Nonfarm Payrolls 2020-01-10 08:15:00 2020-01-10 09:00:00
Change in Nonfarm Payrolls 2020-01-10 08:15:00 2020-01-10 09:00:00
Price date_time ID
24813 2020-03-06 08:14:00 DJ
24763 2020-03-06 08:15:00 DJ
24750 2020-03-06 08:16:00 DJ
24725 2020-03-06 08:17:00 DJ
我想从第二个数据集(开始时间和结束时间)获取价格和 ID,并将其添加到第一个数据集中。我试过像这样使用 ifelse
但它不起作用。
df1$startprice <- ifelse(df1$starttime == df2$date_time, df2$Price, "no")
有人可以帮我吗?
要重现数据:(对于第一个事件,包括开始和结束时间)
df1 <- structure(list(Event = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("Change in Nonfarm Payrolls"), class = "factor"),
starttime = structure(c(1583478900, 1581059700, 1578640500, 1578640500, 1581059700), class = c("POSIXct", "POSIXt"), tzone = ""),
endtime = structure(c(1583481600, 1581062400, 1578643200, 1578643200, 1581062400), class = c("POSIXct","POSIXt"), tzone = "")), row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Price = c(24813, 24763, 24750, 24725,
24746, 24735, 24755, 24735, 24735, 24744, 24762, 24763, 24773,
24773, 24778, 24832, 24856, 24845, 24842, 24902, 24934, 24854,
24888, 24914, 24922, 24875, 24896, 24853, 24834, 24845, 24886,
24872, 24844, 24846, 24860, 24812, 24791, 24767, 24765, 24756,
24745, 24791, 24800, 24789, 24787, 24887, 24876, 24911), date_time = structure(c(1583478840,
1583478900, 1583478960, 1583479020, 1583479080, 1583479140, 1583479200,
1583479260, 1583479320, 1583479380, 1583479440, 1583479500, 1583479560,
1583479620, 1583479680, 1583479740, 1583479800, 1583479860, 1583479920,
1583479980, 1583480040, 1583480100, 1583480160, 1583480220, 1583480280,
1583480340, 1583480400, 1583480460, 1583480520, 1583480580, 1583480640,
1583480700, 1583480760, 1583480820, 1583480880, 1583480940, 1583481000,
1583481060, 1583481120, 1583481180, 1583481240, 1583481300, 1583481360,
1583481420, 1583481480, 1583481540, 1583481600, 1583481660), class = c("POSIXct",
"POSIXt"), tzone = ""), ID = c("DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ")), row.names = 62835:62882, class = "data.frame")
提前致谢!
亲切的问候,
于尔根
我假设您正在尝试通过将第二个数据集的 date_time
与 [=第一个数据集的 19=]。
在这种情况下,可以使用 dplyr 的 left_join
:
来实现
library(dplyr)
df1 %>% left_join(df2, by = c('starttime' = 'date_time'))
输出:
Event starttime endtime Price ID
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA>
更新:
您想要在 starttime
处获得 Price
并且在 endtime
处获得 Price
。
您可以通过管道将另一个 left_join
连接到之前的代码,这次链接 df1 的 endtime
而不是 starttime
:
combinedPrice <- df1 %>% left_join(df2, by = c('starttime' = 'date_time')) %>% left_join(df2, by = c('endtime' = 'date_time'))
combinedPrice
的输出:
Event starttime endtime Price.x ID.x Price.y ID.y
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ 24876 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA>
起止价分别命名为Price.x
和Price.y
。此外,作为连接的结果,我们有 2 ID
列。我们可以重命名价格列并删除 1 个 ID 列,如下所示:
combinedPrice %>% rename('PriceStart' = Price.x, 'PriceEnd' = Price.y, 'ID' = ID.y) %>% select(-ID.x)
输出:
Event starttime endtime PriceStart PriceEnd ID
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 24876 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA NA <NA>
您可以 merge
两次,先使用 starttime
,然后再使用 endtime
。
merge(df1, transform(df2, start_time_price = Price)[-1],
by.x = 'starttime', by.y = 'date_time') |>
merge(transform(df2, end_time_price = Price)[-1],
by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))
如果要在最终输出中保留 df1
的所有行,请在 merge
中使用 all.x = TRUE
。管道运算符 (|>
) 已在 R 4.1 中引入,如果您使用旧版本的 R -
merge(merge(df1, transform(df2, start_time_price = Price)[-1],
by.x = 'starttime', by.y = 'date_time'),
transform(df2, end_time_price = Price)[-1],
by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))
我有两个数据框。第一个包含具有相应开始和结束时间的事件。第二个包含不同 ID 每分钟的价格。往下看:
Event starttime endtime
Change in Nonfarm Payrolls 2020-03-06 08:15:00 2020-03-06 09:00:00
Change in Nonfarm Payrolls 2020-02-07 08:15:00 2020-02-07 09:00:00
Change in Nonfarm Payrolls 2020-01-10 08:15:00 2020-01-10 09:00:00
Change in Nonfarm Payrolls 2020-01-10 08:15:00 2020-01-10 09:00:00
Price date_time ID
24813 2020-03-06 08:14:00 DJ
24763 2020-03-06 08:15:00 DJ
24750 2020-03-06 08:16:00 DJ
24725 2020-03-06 08:17:00 DJ
我想从第二个数据集(开始时间和结束时间)获取价格和 ID,并将其添加到第一个数据集中。我试过像这样使用 ifelse
但它不起作用。
df1$startprice <- ifelse(df1$starttime == df2$date_time, df2$Price, "no")
有人可以帮我吗?
要重现数据:(对于第一个事件,包括开始和结束时间)
df1 <- structure(list(Event = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("Change in Nonfarm Payrolls"), class = "factor"),
starttime = structure(c(1583478900, 1581059700, 1578640500, 1578640500, 1581059700), class = c("POSIXct", "POSIXt"), tzone = ""),
endtime = structure(c(1583481600, 1581062400, 1578643200, 1578643200, 1581062400), class = c("POSIXct","POSIXt"), tzone = "")), row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Price = c(24813, 24763, 24750, 24725,
24746, 24735, 24755, 24735, 24735, 24744, 24762, 24763, 24773,
24773, 24778, 24832, 24856, 24845, 24842, 24902, 24934, 24854,
24888, 24914, 24922, 24875, 24896, 24853, 24834, 24845, 24886,
24872, 24844, 24846, 24860, 24812, 24791, 24767, 24765, 24756,
24745, 24791, 24800, 24789, 24787, 24887, 24876, 24911), date_time = structure(c(1583478840,
1583478900, 1583478960, 1583479020, 1583479080, 1583479140, 1583479200,
1583479260, 1583479320, 1583479380, 1583479440, 1583479500, 1583479560,
1583479620, 1583479680, 1583479740, 1583479800, 1583479860, 1583479920,
1583479980, 1583480040, 1583480100, 1583480160, 1583480220, 1583480280,
1583480340, 1583480400, 1583480460, 1583480520, 1583480580, 1583480640,
1583480700, 1583480760, 1583480820, 1583480880, 1583480940, 1583481000,
1583481060, 1583481120, 1583481180, 1583481240, 1583481300, 1583481360,
1583481420, 1583481480, 1583481540, 1583481600, 1583481660), class = c("POSIXct",
"POSIXt"), tzone = ""), ID = c("DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ",
"DJ")), row.names = 62835:62882, class = "data.frame")
提前致谢! 亲切的问候, 于尔根
我假设您正在尝试通过将第二个数据集的 date_time
与 [=第一个数据集的 19=]。
在这种情况下,可以使用 dplyr 的 left_join
:
library(dplyr)
df1 %>% left_join(df2, by = c('starttime' = 'date_time'))
输出:
Event starttime endtime Price ID
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA>
更新:
您想要在 starttime
处获得 Price
并且在 endtime
处获得 Price
。
您可以通过管道将另一个 left_join
连接到之前的代码,这次链接 df1 的 endtime
而不是 starttime
:
combinedPrice <- df1 %>% left_join(df2, by = c('starttime' = 'date_time')) %>% left_join(df2, by = c('endtime' = 'date_time'))
combinedPrice
的输出:
Event starttime endtime Price.x ID.x Price.y ID.y
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 DJ 24876 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA <NA> NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA <NA> NA <NA>
起止价分别命名为Price.x
和Price.y
。此外,作为连接的结果,我们有 2 ID
列。我们可以重命名价格列并删除 1 个 ID 列,如下所示:
combinedPrice %>% rename('PriceStart' = Price.x, 'PriceEnd' = Price.y, 'ID' = ID.y) %>% select(-ID.x)
输出:
Event starttime endtime PriceStart PriceEnd ID
1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763 24876 DJ
2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA NA <NA>
3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA NA <NA>
4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00 NA NA <NA>
5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00 NA NA <NA>
您可以 merge
两次,先使用 starttime
,然后再使用 endtime
。
merge(df1, transform(df2, start_time_price = Price)[-1],
by.x = 'starttime', by.y = 'date_time') |>
merge(transform(df2, end_time_price = Price)[-1],
by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))
如果要在最终输出中保留 df1
的所有行,请在 merge
中使用 all.x = TRUE
。管道运算符 (|>
) 已在 R 4.1 中引入,如果您使用旧版本的 R -
merge(merge(df1, transform(df2, start_time_price = Price)[-1],
by.x = 'starttime', by.y = 'date_time'),
transform(df2, end_time_price = Price)[-1],
by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))