如何根据 R 中的一列从另一列和行制作新的列复制值?
How to make new column copy values from another column AND row, based on one column in R?
这是一些示例数据:
data <- tibble(line_number = seq(1:5),
test = c("testA", "testB", "testC", "testD", "testE"),
start_date = as_date(c("2021-01-01", "2021-02-01", "2021-02-15", "2021-03-20", "2021-04-12")),
finish_date = as_date(c("2021-01-01", "2021-03-01", "2021-02-18", "2021-05-20", "2021-04-12")),
coded_date = c(NA, "1S", "2F", "2S", "4F"))
# line_number test start_date finish_date coded_date
# <int> <chr> <date> <date> <chr>
# 1 1 testA 2021-01-01 2021-01-01 NA
# 2 2 testB 2021-02-01 2021-03-01 1S
# 3 3 testC 2021-02-15 2021-02-18 2F
# 4 4 testD 2021-03-20 2021-05-20 2S
# 5 5 testE 2021-04-12 2021-04-12 4F
我想创建两个名为“new_start_date”和“new_finish_date”的新列,其内容由“coded_date”列确定。
对于coded_date列:数字对应行号,“S”表示开始日期,“F”表示结束日期。
以第2行1S
为例,我希望“new_start_date”列取第1行开始日期的值并复制,留下“new_finish_date" 与 NA.
以第3行2F
为例,我希望“new_finish_date”列取第2行的完成日期值并复制,留下“new_start_date" 与 NA.
这是我想要的输出:
# line_number test start_date finish_date coded_date new_start_date new_finish_date
# <int> <chr> <date> <date> <chr> <date> <date>
# 1 1 testA 2021-01-01 2021-01-01 NA NA NA
# 2 2 testB 2021-02-01 2021-03-01 1S 2021-01-01 NA
# 3 3 testC 2021-02-15 2021-02-18 2F NA 2021-03-01
# 4 4 testD 2021-03-20 2021-05-20 2S 2021-02-01 NA
# 5 5 testE 2021-04-12 2021-04-12 4F NA 2021-05-20
我仍然是使用 R 的初学者,因此非常感谢任何帮助或输入:)
我的第一直觉是使用两个 for 循环
# deal with start date first
# get all values in 'coded_date' that contain an 'S'
svals <- grep(pattern = "S", x = data$coded_date, value = TRUE)
# we'll go row by row
# for each case (row) in that contains an 'S'
for(sval in svals){
# get the rowid from the value of 'coded_date'
# this is the row where we'll get the new date
rowid <- substring(text = sval, first = 1, last = 1)
# assign a 'new_start_date' to the row where we found sval
# the row containing this new value is defined by rowid
# use '%in% rather than '==' on left side because NAs are present
data[data$coded_date %in% sval, "new_start_date"] <- data[rowid,"start_date"]
}
## repeat for finish date
# S and F loops could be nested together!
fvals <- grep(pattern = "F", x = data$coded_date, value = TRUE)
for(fval in fvals){
rowid <- substring(text = fval, first = 1, last = 1)
data[data$coded_date %in% fval, "new_finish_date"] <- data[rowid,"finish_date"]
e:这是一个似乎有效的矢量化版本。如果有人热衷于此,我敢打赌这里还有改进的余地,我将不胜感激任何反馈!
data$new_start_date2 <- ifelse(
test = grepl(pattern = "S", x = data$coded_date),
yes = data[sub(pattern = "S", replacement = "", data$coded_date),"start_date"],
no = NA)
data$new_finish_date2 <- ifelse(
test = grepl(pattern = "F", x = data$coded_date),
yes = data[sub(pattern = "F", replacement = "", data$coded_date),"finish_date"],
no = NA)
这是一些示例数据:
data <- tibble(line_number = seq(1:5),
test = c("testA", "testB", "testC", "testD", "testE"),
start_date = as_date(c("2021-01-01", "2021-02-01", "2021-02-15", "2021-03-20", "2021-04-12")),
finish_date = as_date(c("2021-01-01", "2021-03-01", "2021-02-18", "2021-05-20", "2021-04-12")),
coded_date = c(NA, "1S", "2F", "2S", "4F"))
# line_number test start_date finish_date coded_date
# <int> <chr> <date> <date> <chr>
# 1 1 testA 2021-01-01 2021-01-01 NA
# 2 2 testB 2021-02-01 2021-03-01 1S
# 3 3 testC 2021-02-15 2021-02-18 2F
# 4 4 testD 2021-03-20 2021-05-20 2S
# 5 5 testE 2021-04-12 2021-04-12 4F
我想创建两个名为“new_start_date”和“new_finish_date”的新列,其内容由“coded_date”列确定。
对于coded_date列:数字对应行号,“S”表示开始日期,“F”表示结束日期。
以第2行1S
为例,我希望“new_start_date”列取第1行开始日期的值并复制,留下“new_finish_date" 与 NA.
以第3行2F
为例,我希望“new_finish_date”列取第2行的完成日期值并复制,留下“new_start_date" 与 NA.
这是我想要的输出:
# line_number test start_date finish_date coded_date new_start_date new_finish_date
# <int> <chr> <date> <date> <chr> <date> <date>
# 1 1 testA 2021-01-01 2021-01-01 NA NA NA
# 2 2 testB 2021-02-01 2021-03-01 1S 2021-01-01 NA
# 3 3 testC 2021-02-15 2021-02-18 2F NA 2021-03-01
# 4 4 testD 2021-03-20 2021-05-20 2S 2021-02-01 NA
# 5 5 testE 2021-04-12 2021-04-12 4F NA 2021-05-20
我仍然是使用 R 的初学者,因此非常感谢任何帮助或输入:)
我的第一直觉是使用两个 for 循环
# deal with start date first
# get all values in 'coded_date' that contain an 'S'
svals <- grep(pattern = "S", x = data$coded_date, value = TRUE)
# we'll go row by row
# for each case (row) in that contains an 'S'
for(sval in svals){
# get the rowid from the value of 'coded_date'
# this is the row where we'll get the new date
rowid <- substring(text = sval, first = 1, last = 1)
# assign a 'new_start_date' to the row where we found sval
# the row containing this new value is defined by rowid
# use '%in% rather than '==' on left side because NAs are present
data[data$coded_date %in% sval, "new_start_date"] <- data[rowid,"start_date"]
}
## repeat for finish date
# S and F loops could be nested together!
fvals <- grep(pattern = "F", x = data$coded_date, value = TRUE)
for(fval in fvals){
rowid <- substring(text = fval, first = 1, last = 1)
data[data$coded_date %in% fval, "new_finish_date"] <- data[rowid,"finish_date"]
e:这是一个似乎有效的矢量化版本。如果有人热衷于此,我敢打赌这里还有改进的余地,我将不胜感激任何反馈!
data$new_start_date2 <- ifelse(
test = grepl(pattern = "S", x = data$coded_date),
yes = data[sub(pattern = "S", replacement = "", data$coded_date),"start_date"],
no = NA)
data$new_finish_date2 <- ifelse(
test = grepl(pattern = "F", x = data$coded_date),
yes = data[sub(pattern = "F", replacement = "", data$coded_date),"finish_date"],
no = NA)