如何创建函数以有条件地在多列中执行算术运算
How to create a function to conditionally execute arithmetic operations in multiple columns
鉴于下面的示例数据 sampleDT
,如果您能帮助我创建一个有效执行以下操作的函数,我将不胜感激:
对于名称以 dollar
开头的每个变量:
在 sampleDT$employer==1
的那些行中执行 3-(5/j)
;
在 sampleDT$employer==0
的行中执行 2*j
;
将运算结果放在一个新变量中,该变量位于它所基于的列的旁边;
保持dollar.wage_1
的值不变;
将操作的输出放在新变量euro.wage_x
中,其名称只是将源变量dollar.wage_x
中的dollar
替换为euro
。 x
是dollar.wage
个变量的个数。
创建名为 division.wage_x
的新变量,其中包含每对 dollar.wage_x
和 euro.wage_x
dollar.wage_x
除以 [=19= 的结果].
其中j
代表变量的值
dollar.wage_1:dollar.wage_10
拿。
示例数据
sampleDT<-structure(list(id = 1:10, N = c(10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L), A = c(62L, 96L, 17L, 41L, 212L, 143L, 143L,
143L, 73L, 73L), B = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L,
17L), C = c(0.05, 0.01, 0, 0.05, 0.8, 0.15, 0, 0.23, 0.85, 0.23
), employer = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L), F = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), G = c(1.94, 1.19, 1.16,
1.16, 1.13, 1.13, 1.13, 1.13, 1.12, 1.12), H = c(0.14, 0.24,
0.28, 0.28, 0.21, 0.12, 0.17, 0.07, 0.14, 0.12), dollar.wage_1 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_2 = c(1.93,
1.18, 3.15, 3.15, 1.12, 1.12, 2.12, 1.12, 1.11, 1.11), dollar.wage_3 = c(1.95,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.13, 1.13), dollar.wage_4 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_5 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_6 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_7 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_8 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_9 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_10 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12)), row.names = c(NA,
-10L), class = "data.frame")
头输出
id N A B C employer F G H dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7 dollar.wage_8 dollar.wage_9 dollar.wage_10
1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93 1.95 1.94 1.94 1.94 1.94 1.94 1.94 1.94
2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18 1.19 1.18 1.19 1.18 1.19 1.19 1.19 1.19
3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16 3.16 3.16 3.16
我正在寻找一种有效的方法来执行此操作,因为我的实际数据集有超过 1000 个变量 dollar.wage_x
,其中 x > 1000
.
在此先感谢您的帮助。
或基数 R:
sampleDT[, grepl("dollar", colnames(sampleDT))] <-
lapply(sampleDT[ , grepl("dollar", colnames(sampleDT))],
function(x) {
res <- 3 - 5 * x
res[sampleDT$employer==0] <- 2 * x[sampleDT$employer==0]
res
} )
使用data.table
:
library(data.table)
setDT(sampleDT)
o_cols <- grep("^dollar", names(sampleDT), value = TRUE)
n_cols <- sub("^dollar", "euro", o_cols)
sampleDT[, (n_cols) := lapply(.SD, function(j) ifelse(employer == 1, 3 - 5 / j, 2 * j)), .SDcols = o_cols]
> sampleDT
id N A B C employer F G H dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7
1: 1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93 1.95 1.94 1.94 1.94 1.94
2: 2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18 1.19 1.18 1.19 1.18 1.19
3: 3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16
4: 4 10 41 2 0.05 1 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16
5: 5 10 212 170 0.80 0 0 1.13 0.21 1.13 1.12 1.14 1.13 1.14 1.13 1.14
6: 6 10 143 21 0.15 1 1 1.13 0.12 1.13 1.12 1.13 1.13 1.13 1.13 1.13
7: 7 10 143 0 0.00 1 1 1.13 0.17 2.13 2.12 2.13 2.13 2.13 2.13 2.13
8: 8 10 143 33 0.23 0 1 1.13 0.07 1.13 1.12 1.13 1.13 1.13 1.13 1.13
9: 9 10 73 62 0.85 0 1 1.12 0.14 1.12 1.11 1.13 1.12 1.12 1.12 1.12
10: 10 10 73 17 0.23 0 1 1.12 0.12 1.12 1.11 1.13 1.12 1.12 1.12 1.12
dollar.wage_8 dollar.wage_9 dollar.wage_10 euro.wage_1 euro.wage_2 euro.wage_3 euro.wage_4 euro.wage_5 euro.wage_6 euro.wage_7 euro.wage_8 euro.wage_9
1: 1.94 1.94 1.94 0.4226804 0.4093264 0.4358974 0.4226804 0.4226804 0.4226804 0.4226804 0.4226804 0.4226804
2: 1.19 1.19 1.19 -1.2016807 -1.2372881 -1.2016807 -1.2372881 -1.2016807 -1.2372881 -1.2016807 -1.2016807 -1.2016807
3: 3.16 3.16 3.16 6.3200000 6.3000000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000
4: 3.16 3.16 3.16 1.4177215 1.4126984 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215
5: 1.13 1.13 1.13 2.2600000 2.2400000 2.2800000 2.2600000 2.2800000 2.2600000 2.2800000 2.2600000 2.2600000
6: 1.13 1.13 1.13 -1.4247788 -1.4642857 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788
7: 2.13 2.13 2.13 0.6525822 0.6415094 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822
8: 1.13 1.13 1.13 2.2600000 2.2400000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000
9: 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000
10: 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000
euro.wage_10
1: 0.4226804
2: -1.2016807
3: 6.3200000
4: 1.4177215
5: 2.2600000
6: -1.4247788
7: 0.6525822
8: 2.2600000
9: 2.2400000
10: 2.2400000
这是一种tidyverse
可能性:
sampleDT %>%
mutate_at(vars(contains("dollar")), funs(euro.wage = ifelse(employer == 1, 3-(5/.), 2*.))) %>%
rename_at(vars(contains("euro.wage")),
funs(paste(sub(".*\_", "", .), gsub("[^0-9]", "\1", .), sep = "_")))
id N A B C employer F G H dollar.wage_1 dollar.wage_2
1 1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93
2 2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18
3 3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15
4 4 10 41 2 0.05 1 0 1.16 0.28 3.16 3.15
5 5 10 212 170 0.80 0 0 1.13 0.21 1.13 1.12
6 6 10 143 21 0.15 1 1 1.13 0.12 1.13 1.12
7 7 10 143 0 0.00 1 1 1.13 0.17 2.13 2.12
8 8 10 143 33 0.23 0 1 1.13 0.07 1.13 1.12
9 9 10 73 62 0.85 0 1 1.12 0.14 1.12 1.11
10 10 10 73 17 0.23 0 1 1.12 0.12 1.12 1.11
dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7
1 1.95 1.94 1.94 1.94 1.94
2 1.19 1.18 1.19 1.18 1.19
3 3.16 3.16 3.16 3.16 3.16
4 3.16 3.16 3.16 3.16 3.16
5 1.14 1.13 1.14 1.13 1.14
6 1.13 1.13 1.13 1.13 1.13
7 2.13 2.13 2.13 2.13 2.13
8 1.13 1.13 1.13 1.13 1.13
9 1.13 1.12 1.12 1.12 1.12
10 1.13 1.12 1.12 1.12 1.12
dollar.wage_8 dollar.wage_9 dollar.wage_10 euro.wage_1 euro.wage_2 euro.wage_3
1 1.94 1.94 1.94 0.4226804 0.4093264 0.4358974
2 1.19 1.19 1.19 -1.2016807 -1.2372881 -1.2016807
3 3.16 3.16 3.16 6.3200000 6.3000000 6.3200000
4 3.16 3.16 3.16 1.4177215 1.4126984 1.4177215
5 1.13 1.13 1.13 2.2600000 2.2400000 2.2800000
6 1.13 1.13 1.13 -1.4247788 -1.4642857 -1.4247788
7 2.13 2.13 2.13 0.6525822 0.6415094 0.6525822
8 1.13 1.13 1.13 2.2600000 2.2400000 2.2600000
9 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000
10 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000
鉴于下面的示例数据 sampleDT
,如果您能帮助我创建一个有效执行以下操作的函数,我将不胜感激:
对于名称以 dollar
开头的每个变量:
在
sampleDT$employer==1
的那些行中执行3-(5/j)
;在
sampleDT$employer==0
的行中执行2*j
;将运算结果放在一个新变量中,该变量位于它所基于的列的旁边;
保持
dollar.wage_1
的值不变;将操作的输出放在新变量
euro.wage_x
中,其名称只是将源变量dollar.wage_x
中的dollar
替换为euro
。x
是dollar.wage
个变量的个数。创建名为
division.wage_x
的新变量,其中包含每对dollar.wage_x
和euro.wage_x
dollar.wage_x
除以 [=19= 的结果].
其中j
代表变量的值
dollar.wage_1:dollar.wage_10
拿。
示例数据
sampleDT<-structure(list(id = 1:10, N = c(10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L), A = c(62L, 96L, 17L, 41L, 212L, 143L, 143L,
143L, 73L, 73L), B = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L,
17L), C = c(0.05, 0.01, 0, 0.05, 0.8, 0.15, 0, 0.23, 0.85, 0.23
), employer = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L), F = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), G = c(1.94, 1.19, 1.16,
1.16, 1.13, 1.13, 1.13, 1.13, 1.12, 1.12), H = c(0.14, 0.24,
0.28, 0.28, 0.21, 0.12, 0.17, 0.07, 0.14, 0.12), dollar.wage_1 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_2 = c(1.93,
1.18, 3.15, 3.15, 1.12, 1.12, 2.12, 1.12, 1.11, 1.11), dollar.wage_3 = c(1.95,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.13, 1.13), dollar.wage_4 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_5 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_6 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_7 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_8 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_9 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_10 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12)), row.names = c(NA,
-10L), class = "data.frame")
头输出
id N A B C employer F G H dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7 dollar.wage_8 dollar.wage_9 dollar.wage_10
1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93 1.95 1.94 1.94 1.94 1.94 1.94 1.94 1.94
2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18 1.19 1.18 1.19 1.18 1.19 1.19 1.19 1.19
3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16 3.16 3.16 3.16
我正在寻找一种有效的方法来执行此操作,因为我的实际数据集有超过 1000 个变量 dollar.wage_x
,其中 x > 1000
.
在此先感谢您的帮助。
或基数 R:
sampleDT[, grepl("dollar", colnames(sampleDT))] <-
lapply(sampleDT[ , grepl("dollar", colnames(sampleDT))],
function(x) {
res <- 3 - 5 * x
res[sampleDT$employer==0] <- 2 * x[sampleDT$employer==0]
res
} )
使用data.table
:
library(data.table)
setDT(sampleDT)
o_cols <- grep("^dollar", names(sampleDT), value = TRUE)
n_cols <- sub("^dollar", "euro", o_cols)
sampleDT[, (n_cols) := lapply(.SD, function(j) ifelse(employer == 1, 3 - 5 / j, 2 * j)), .SDcols = o_cols]
> sampleDT
id N A B C employer F G H dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7
1: 1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93 1.95 1.94 1.94 1.94 1.94
2: 2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18 1.19 1.18 1.19 1.18 1.19
3: 3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16
4: 4 10 41 2 0.05 1 0 1.16 0.28 3.16 3.15 3.16 3.16 3.16 3.16 3.16
5: 5 10 212 170 0.80 0 0 1.13 0.21 1.13 1.12 1.14 1.13 1.14 1.13 1.14
6: 6 10 143 21 0.15 1 1 1.13 0.12 1.13 1.12 1.13 1.13 1.13 1.13 1.13
7: 7 10 143 0 0.00 1 1 1.13 0.17 2.13 2.12 2.13 2.13 2.13 2.13 2.13
8: 8 10 143 33 0.23 0 1 1.13 0.07 1.13 1.12 1.13 1.13 1.13 1.13 1.13
9: 9 10 73 62 0.85 0 1 1.12 0.14 1.12 1.11 1.13 1.12 1.12 1.12 1.12
10: 10 10 73 17 0.23 0 1 1.12 0.12 1.12 1.11 1.13 1.12 1.12 1.12 1.12
dollar.wage_8 dollar.wage_9 dollar.wage_10 euro.wage_1 euro.wage_2 euro.wage_3 euro.wage_4 euro.wage_5 euro.wage_6 euro.wage_7 euro.wage_8 euro.wage_9
1: 1.94 1.94 1.94 0.4226804 0.4093264 0.4358974 0.4226804 0.4226804 0.4226804 0.4226804 0.4226804 0.4226804
2: 1.19 1.19 1.19 -1.2016807 -1.2372881 -1.2016807 -1.2372881 -1.2016807 -1.2372881 -1.2016807 -1.2016807 -1.2016807
3: 3.16 3.16 3.16 6.3200000 6.3000000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000 6.3200000
4: 3.16 3.16 3.16 1.4177215 1.4126984 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215 1.4177215
5: 1.13 1.13 1.13 2.2600000 2.2400000 2.2800000 2.2600000 2.2800000 2.2600000 2.2800000 2.2600000 2.2600000
6: 1.13 1.13 1.13 -1.4247788 -1.4642857 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788 -1.4247788
7: 2.13 2.13 2.13 0.6525822 0.6415094 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822 0.6525822
8: 1.13 1.13 1.13 2.2600000 2.2400000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000 2.2600000
9: 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000
10: 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000 2.2400000
euro.wage_10
1: 0.4226804
2: -1.2016807
3: 6.3200000
4: 1.4177215
5: 2.2600000
6: -1.4247788
7: 0.6525822
8: 2.2600000
9: 2.2400000
10: 2.2400000
这是一种tidyverse
可能性:
sampleDT %>%
mutate_at(vars(contains("dollar")), funs(euro.wage = ifelse(employer == 1, 3-(5/.), 2*.))) %>%
rename_at(vars(contains("euro.wage")),
funs(paste(sub(".*\_", "", .), gsub("[^0-9]", "\1", .), sep = "_")))
id N A B C employer F G H dollar.wage_1 dollar.wage_2
1 1 10 62 3 0.05 1 0 1.94 0.14 1.94 1.93
2 2 10 96 1 0.01 1 0 1.19 0.24 1.19 1.18
3 3 10 17 0 0.00 0 0 1.16 0.28 3.16 3.15
4 4 10 41 2 0.05 1 0 1.16 0.28 3.16 3.15
5 5 10 212 170 0.80 0 0 1.13 0.21 1.13 1.12
6 6 10 143 21 0.15 1 1 1.13 0.12 1.13 1.12
7 7 10 143 0 0.00 1 1 1.13 0.17 2.13 2.12
8 8 10 143 33 0.23 0 1 1.13 0.07 1.13 1.12
9 9 10 73 62 0.85 0 1 1.12 0.14 1.12 1.11
10 10 10 73 17 0.23 0 1 1.12 0.12 1.12 1.11
dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7
1 1.95 1.94 1.94 1.94 1.94
2 1.19 1.18 1.19 1.18 1.19
3 3.16 3.16 3.16 3.16 3.16
4 3.16 3.16 3.16 3.16 3.16
5 1.14 1.13 1.14 1.13 1.14
6 1.13 1.13 1.13 1.13 1.13
7 2.13 2.13 2.13 2.13 2.13
8 1.13 1.13 1.13 1.13 1.13
9 1.13 1.12 1.12 1.12 1.12
10 1.13 1.12 1.12 1.12 1.12
dollar.wage_8 dollar.wage_9 dollar.wage_10 euro.wage_1 euro.wage_2 euro.wage_3
1 1.94 1.94 1.94 0.4226804 0.4093264 0.4358974
2 1.19 1.19 1.19 -1.2016807 -1.2372881 -1.2016807
3 3.16 3.16 3.16 6.3200000 6.3000000 6.3200000
4 3.16 3.16 3.16 1.4177215 1.4126984 1.4177215
5 1.13 1.13 1.13 2.2600000 2.2400000 2.2800000
6 1.13 1.13 1.13 -1.4247788 -1.4642857 -1.4247788
7 2.13 2.13 2.13 0.6525822 0.6415094 0.6525822
8 1.13 1.13 1.13 2.2600000 2.2400000 2.2600000
9 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000
10 1.12 1.12 1.12 2.2400000 2.2200000 2.2600000