如何使用 R 将零或 NA 放入具有两种日期类型的数据框中?

How can I put zero or NA in a dataframe having 2 type of dates using R?

基本思想是添加 0 或 NA,因此在当天没有值的变量中添加 NA 或 0 的行。enter image description here

该函数应该对齐两个变量,但也应该用于价格,同时在没有值的变量中添加 NA。 enter image description here

这是我的数据集,我想添加 0 和 NA,例如在变量期货和变量日期中错过此值的时期。

|数据 |日期 | ‘期货’ | | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 | 2021-12-17 | 1379.97
|6 2021-12-18 | 2021-12-16 | 1597.91

函数应该大致像这样工作:

|数据 |日期 | ‘期货’ | | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 |不适用 |不适用
|6 2021-12-18 |不适用 |不适用

我想过for循环,但我做不到。

非常感谢您的帮助

我创建了一些任意数据并随机删除了行。这是根据您对问题所做的更改进行编辑的。你没有具体说明这一点,但我假设如果第一个日期字段是 NA,你想保留价格。

library(tidyverse)

daten <- seq(as.Date("2020/10/23"), as.Date("2021/10/12"), "days")
price <- round(runif(350, 1000, 1500), digits = 2)

dff <- data.frame(dateOne = sort(sample(daten, size = 325, replace = F), 
                                 decreasing = T),
                  dateTwo = sort(sample(daten, size = 325, replace = F), 
                                 decreasing = T),
                  futures = sample(price, size = 325, replace = T))

这是基于日期是有序的假设。

ordering <- function(d, dT, df1){ # two fields with dates and the data frame
  
  # get indices of date columns
  tellMe <- which(colnames(df1) %in% c(d, dT))
  
  # create ranking (to return original sorting)
  df1$rank <- 1:nrow(df1)
  
  # separate and sort date columns
                # the first date; the rank
  dc  <<- df1[, c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(d))

                # everything *except the first date field & rank
  dTc <<- df1[, -c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(dT))

  # identify the index of the date in dTc
  tellMe2 <- which(colnames(dTc) == {{ dT }})

  # find differences
         # missing in the first date field
             # the index of the date is in tellMe2
  dfd2 <- dTc[!dTc[, tellMe2] %in% dc[, 1], tellMe2] 

         # missing in the second date field
  dfd  <- dc[!dc[, 1] %in% dTc[, tellMe2], 1] # since date is in column 1
  
  # find indices of where the NA's need to be placed
  dcInt <<- lapply(dfd2,
                  findInterval,
                  unlist(dc[, 1])) %>% 
    unlist()
  dTcInt <<- lapply(dfd,
                   findInterval,
                   unlist(dTc[, tellMe2])) %>% 
    unlist()

  # build up with differences as NA
  # preceding index provided, offset by index number - 1
  for(i in 1:length(dcInt)){
    dc <- rbind(dc[0:(dcInt[[i]] + i - 1), ], # everything before
                rep(NA, times = ncol(dc)),
                dc[(dcInt[[i]] + i):nrow(dc), ], # everything after
                make.row.names = F)
  }
  
  # preceding index provided, offset by index number - 1
  for(j in 1:length(dTcInt)){
    dTc <- rbind(dTc[0:(dTcInt[[j]] + j - 1), ], # everything before
                 rep(NA, times = ncol(dTc)), 
                 dTc[(dTcInt[[j]] + j):nrow(dTc), ], # everything after
                 make.row.names = F)
  } 
  
  # reassemble the data, in the original order
  df2 <- cbind(dc, dTc) %>%
    select(colnames(df1), rank)
  
  # check row order
  # if the ranking added has any number 1:10 in the first 10 rows
  if(length(df2[1:10, ]$rank %in% 1:10) == 0){
    # add a new ranking variable
    df2$rank2 <- 1:nrow(df2)
    # reverse the new ranking variable and delete both ranking variables
    df2 <- arrange(df2, -rank2) %>% select(-rank, -rank2)
  } else {
    # delete the ranking variable; they are already in the right order
    df2 <- select(df2, -rank)
  }
  return(df2)
}

现在您可以对数据使用此函数了。

tryIt <- ordering("dateOne", "dateTwo", dff)
head(tryIt)

这将 return 一个数据框。发送到函数时,无论是按日期递增还是递减排序,它都会 return 按照发送的顺序排列。

tail(tryIt, n = 15)