如何在引用 R 中其他列的一列的值上创建多个计算列?
How to create multiple calculated columns on the value of one colum which refers to other columns in R?
我正在尝试创建一个时间序列来显示特定列在特定时间的值。我目前只能访问一个 table,它记录了所有更改、列的当前值、日期和已更改的列的名称。我想创建一个新列来跟踪该列在更改之前的先前值。 “Column_name”
中引用的更改日志中有超过 63 个不同的列
这是我目前拥有的
________________________________________________
Name | date |A | B |C |NEW | Column_name|
bob | 12302019|2 | 23 |153|2 | a |
bob | 12102019|2 | 23 |153|362 | a |
bob | 10242019|2 | 23 |153|7 | a |
john | 10062017|684| 452|1 |254 | c |
john | 11052018|684| 452|1 |1 | c |
________________________________________________
这就是我想帮助创建的
_________________________________________________________________________________
Name | date |A | B |C |NEW | Column_name| a_ at Date| b_ at Date | c_ at Date |
bob | 12302019|2 | 23 |153|2 | a |2 | 23 | 153 |
bob | 12102019|2 | 23 |153|362 | a |362 | 23 | 153 |
bob | 10242019|2 | 23 |153|7 | a |7 | 23 | 153 |
john | 10062017|684| 452|1 |254 | c |684 | 452 | 254 |
john | 11052018|684| 452|1 |1 | c |684 | 452 | 1 |
______________________________________________________________________________________
I have tested the solution on the following test Data frame, where there is only one column Name "A" and it has several factors
'data.frame': 755 obs. of 5 variables:
$ name : int 606765182 83595892 538663788 779873188 957405600 522796409 41212559 145402647 304688204 83595892 ...
$ date : POSIXct, format: "2019-11-01" "2019-11-01" "2019-10-21" ...
$ A : Factor
$ B : Factor
$ C : Factor
$ Column_name: Factor w/ 1
$ NEW : Factor w/ 8
基础 R
这是一个基本的 R 解决方案。它使用 sapply/ifelse
创建一个包含新值的矩阵,然后 cbind
使用输入数据帧 df1
.
cols_to_change <- c("A", "B", "C")
tmp <- sapply(cols_to_change, function(x){
x2 <- tolower(x)
y <- tolower(df1[["Column_name"]])
ifelse(x2 == y, df1[["NEW"]], df1[[x]])
})
colnames(tmp) <- paste0(colnames(tmp), "_new")
df2 <- cbind(df1, tmp)
rm(tmp) # final cleanup
dplyr
解决方案。
newcol <- function(x, DF){
x <- deparse(substitute(x))
x2 <- tolower(x)
y <- tolower(DF[["Column_name"]])
ifelse(x2 == y, DF[["NEW"]], DF[[x]])
}
df1 %>%
mutate_at(vars(cols_to_change),
.funs = funs(new=newcol(., df1)))
数据.
df1 <-
structure(list(Name = c("bob", "bob", "bob", "john", "john"),
date = c(12302019L, 12102019L, 10242019L, 10062017L, 11052018L),
A = c(2, 2, 2, 684, 684), B = c(23, 23, 23, 452, 452),
C = c(153, 153, 153, 1, 1), NEW = c(2, 362, 7, 254, 1),
Column_name = c("a", "a", "a", "c", "c")),
row.names = c(NA, -5L), class = "data.frame")
我正在尝试创建一个时间序列来显示特定列在特定时间的值。我目前只能访问一个 table,它记录了所有更改、列的当前值、日期和已更改的列的名称。我想创建一个新列来跟踪该列在更改之前的先前值。 “Column_name”
中引用的更改日志中有超过 63 个不同的列这是我目前拥有的
________________________________________________
Name | date |A | B |C |NEW | Column_name|
bob | 12302019|2 | 23 |153|2 | a |
bob | 12102019|2 | 23 |153|362 | a |
bob | 10242019|2 | 23 |153|7 | a |
john | 10062017|684| 452|1 |254 | c |
john | 11052018|684| 452|1 |1 | c |
________________________________________________
这就是我想帮助创建的
_________________________________________________________________________________
Name | date |A | B |C |NEW | Column_name| a_ at Date| b_ at Date | c_ at Date |
bob | 12302019|2 | 23 |153|2 | a |2 | 23 | 153 |
bob | 12102019|2 | 23 |153|362 | a |362 | 23 | 153 |
bob | 10242019|2 | 23 |153|7 | a |7 | 23 | 153 |
john | 10062017|684| 452|1 |254 | c |684 | 452 | 254 |
john | 11052018|684| 452|1 |1 | c |684 | 452 | 1 |
______________________________________________________________________________________
I have tested the solution on the following test Data frame, where there is only one column Name "A" and it has several factors
'data.frame': 755 obs. of 5 variables:
$ name : int 606765182 83595892 538663788 779873188 957405600 522796409 41212559 145402647 304688204 83595892 ...
$ date : POSIXct, format: "2019-11-01" "2019-11-01" "2019-10-21" ...
$ A : Factor
$ B : Factor
$ C : Factor
$ Column_name: Factor w/ 1
$ NEW : Factor w/ 8
基础 R
这是一个基本的 R 解决方案。它使用 sapply/ifelse
创建一个包含新值的矩阵,然后 cbind
使用输入数据帧 df1
.
cols_to_change <- c("A", "B", "C")
tmp <- sapply(cols_to_change, function(x){
x2 <- tolower(x)
y <- tolower(df1[["Column_name"]])
ifelse(x2 == y, df1[["NEW"]], df1[[x]])
})
colnames(tmp) <- paste0(colnames(tmp), "_new")
df2 <- cbind(df1, tmp)
rm(tmp) # final cleanup
dplyr
解决方案。
newcol <- function(x, DF){
x <- deparse(substitute(x))
x2 <- tolower(x)
y <- tolower(DF[["Column_name"]])
ifelse(x2 == y, DF[["NEW"]], DF[[x]])
}
df1 %>%
mutate_at(vars(cols_to_change),
.funs = funs(new=newcol(., df1)))
数据.
df1 <-
structure(list(Name = c("bob", "bob", "bob", "john", "john"),
date = c(12302019L, 12102019L, 10242019L, 10062017L, 11052018L),
A = c(2, 2, 2, 684, 684), B = c(23, 23, 23, 452, 452),
C = c(153, 153, 153, 1, 1), NEW = c(2, 362, 7, 254, 1),
Column_name = c("a", "a", "a", "c", "c")),
row.names = c(NA, -5L), class = "data.frame")