如何使用 dplyr:mutate 乘以变量名的部分指定的列对
How to use dplyr:mutate to mulitply pairs of columns specified by parts of the variable name
我有以下示例:
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012
1 1 2 5 6 0.4 0.6 0.7
2 2 5 1 1 0.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9
下面的table就是我想要得到的。
我想为每个日期创建一个新列(例如“01.2012”):
res_date = fix_date * fox_date
由于我有很多日期/日期对,我想这需要通过遍历名称来完成。
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012 res_01.2012 res_02.2012 res_03.2012
1 1 2 5 6 0.4 0.6 0.7 0.8 3.0 4.2
2 2 5 1 1 0.5 0.5 0.5 2.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9 4.9 5.6 4.5
有人可以帮忙吗?首先十分感谢!
这里有一个想法,使用split.default
根据相似的列名(根据您的条件)拆分数据框。然后我们遍历该列表并乘以列。在这种情况下,我们使用Reduce
(而不是i[1]*i[2]
)进行相乘,以便占两列以上
do.call(cbind,
lapply(split.default(df[-1], gsub('.*_', '', names(df[-1]))), function(i) Reduce(`*`, i)))
# 01.2012 02.2012 03.2012
#[1,] 0.8 3.0 4.2
#[2,] 2.5 0.5 0.5
#[3,] 4.9 5.6 4.5
用 cbind.data.frame()
将它们绑定回原来的
如果您想要 tidyverse
方法,需要使用一些整洁的评估来获得您想要的。
library(tidyverse)
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
# colnames with "fix"
fix <- names(df)[grepl("fix",names(df))]
# colnames with "fox"
fox <- names(df)[grepl("fox",names(df))]
# Iterate over the two vectors of names and column bind the results (map2_dfc).
# Since these are strings, we need to have them evaluated as symbols
# Creating the column name just requires the string to be evaluated.
map2_dfc(fix, fox, ~transmute(df, !!paste0("res", str_extract(.x, "_(0\d)")) := !!sym(.x) * !!sym(.y)))
#> res_01 res_02 res_03
#> 1 0.8 3.0 4.2
#> 2 2.5 0.5 0.5
#> 3 4.9 5.6 4.5
比其他答案冗长得多,但在我看来更容易 read/edit/adapt,是一种繁重的收集传播方法(如果我逐步解决问题,我会采用这种方法-步骤):
library(tidyr)
library(dplyr)
df %>%
gather(-id, key=colname, value=value) %>%
separate(colname, c('fixfox', 'date'), sep='_') %>%
spread(key=fixfox, value=value) %>%
mutate(res=fix*fox) %>%
gather(-id, -date, key=colname, value=value) %>%
unite(new_colname, colname, date, sep='_') %>%
spread(key=new_colname, value=value)
我有以下示例:
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012
1 1 2 5 6 0.4 0.6 0.7
2 2 5 1 1 0.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9
下面的table就是我想要得到的。 我想为每个日期创建一个新列(例如“01.2012”):
res_date = fix_date * fox_date
由于我有很多日期/日期对,我想这需要通过遍历名称来完成。
id fix_01.2012 fix_02.2012 fix_03.2012 fox_01.2012 fox_02.2012 fox_03.2012 res_01.2012 res_02.2012 res_03.2012
1 1 2 5 6 0.4 0.6 0.7 0.8 3.0 4.2
2 2 5 1 1 0.5 0.5 0.5 2.5 0.5 0.5
3 3 7 7 5 0.7 0.8 0.9 4.9 5.6 4.5
有人可以帮忙吗?首先十分感谢!
这里有一个想法,使用split.default
根据相似的列名(根据您的条件)拆分数据框。然后我们遍历该列表并乘以列。在这种情况下,我们使用Reduce
(而不是i[1]*i[2]
)进行相乘,以便占两列以上
do.call(cbind,
lapply(split.default(df[-1], gsub('.*_', '', names(df[-1]))), function(i) Reduce(`*`, i)))
# 01.2012 02.2012 03.2012
#[1,] 0.8 3.0 4.2
#[2,] 2.5 0.5 0.5
#[3,] 4.9 5.6 4.5
用 cbind.data.frame()
如果您想要 tidyverse
方法,需要使用一些整洁的评估来获得您想要的。
library(tidyverse)
df <- data.frame(
id = c(1,2,3),
fix_01.2012 = c(2,5,7),
fix_02.2012 = c(5,1,7),
fix_03.2012 = c(6,1,5),
fox_01.2012 = c(0.4, 0.5, 0.7),
fox_02.2012 = c(0.6, 0.5, 0.8),
fox_03.2012 = c(0.7, 0.5, 0.9)
)
# colnames with "fix"
fix <- names(df)[grepl("fix",names(df))]
# colnames with "fox"
fox <- names(df)[grepl("fox",names(df))]
# Iterate over the two vectors of names and column bind the results (map2_dfc).
# Since these are strings, we need to have them evaluated as symbols
# Creating the column name just requires the string to be evaluated.
map2_dfc(fix, fox, ~transmute(df, !!paste0("res", str_extract(.x, "_(0\d)")) := !!sym(.x) * !!sym(.y)))
#> res_01 res_02 res_03
#> 1 0.8 3.0 4.2
#> 2 2.5 0.5 0.5
#> 3 4.9 5.6 4.5
比其他答案冗长得多,但在我看来更容易 read/edit/adapt,是一种繁重的收集传播方法(如果我逐步解决问题,我会采用这种方法-步骤):
library(tidyr)
library(dplyr)
df %>%
gather(-id, key=colname, value=value) %>%
separate(colname, c('fixfox', 'date'), sep='_') %>%
spread(key=fixfox, value=value) %>%
mutate(res=fix*fox) %>%
gather(-id, -date, key=colname, value=value) %>%
unite(new_colname, colname, date, sep='_') %>%
spread(key=new_colname, value=value)