根据第二个数据框中 2 列中的值向第一个数据框中的某些列添加后缀
Add suffix to some columns in 1st dataframe based on values in 2 columns in 2nd dataframe
任务 1:我想根据第二个数据框中 2 列中的值向第一个数据框中的某些列添加后缀。
我的伪代码是:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select column names in case.df dataframe.
任务 2:接下来,我想再次基于 dict.df 中两列中的值向名为 columnNames 的第二个数据框列中的单元格添加相同的后缀。
我的伪代码是:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select cell contents in dict.df$columnName.
目前有这 2 个数据框需要重命名(因此在代码中硬编码列名是不切实际的):
agegen <- c('15m','34f','56m','49f','28m','37f')
race <- c('w','h','a','w','a','o')
eth <- c('-','h','-','-','h','-')
disp1 <- c('witness was violent', 'officer arrested suspect', 'never responded address', 'officer arrested suspect', 'suspect ran away', 'suspect ran away')
disp2<- c('new client', 'revisit to address', 'parent custody', 'county jail', 'drumset missing', 'new lockup')
disp3<- c('violent witness', 'bonded out', 'future court date', 'new client', 'weapon charge', 'girlfriend suspect')
disp4<- c('violent witness', 'knife in kitchen', 'suspect at precinct', 'new client', '3 people involved', 'girlfriend suspect')
case.df <- data.frame(agegen,race,eth,disp1,disp2,disp3,disp4)
case.df
# agegen race eth disp1 disp2 disp3 disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
columnNames <- c('agegen','race','eth','disp1','disp2','disp3','disp4')
timing <- c('0t','6m','0t','3t','3t','0t','0t')
source <- c('cre','cre','aft','cre1','aft','cre','aft')
dict.df <- data.frame(columnNames,timing,source)
dict.df
# columnNames timing source
# 1 agegen 0t cre
# 2 race 6m cre
# 3 eth 0t aft
# 4 disp1 3t cre1
# 5 disp2 3t aft
# 6 disp3 0t cre
# 7 disp4 0t aft
结果需要这 2 个数据框(不想重命名列名,除非“cre”或“cre1”出现在 dict.df$source 列的相应行中):
# agegen_0t race_6m eth disp1_3t disp2 disp3_0t disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
# columnNames timing source
# 1 agegen_0t 0t cre
# 2 race_6m 6m cre
# 3 eth 0t aft
# 4 disp1_3t 3t cre1
# 5 disp2 3t aft
# 6 disp3_0t 0t cre
# 7 disp4 0t aft
更新
我在计时列中发现了一些“0”值(不是“0t”)。对于那些我想将后缀“_start”分配给 dict.df 中的列名,并添加“_start”作为 case.df 中相关列名称的后缀。基本上,类似于 if dict.df$timing == "0" then dict.df$columnNames <- "_start"
和 if dict.df$timing == "0" then all relevant columns in case.df get suffix <-"_start"
有什么想法可以调整 langtang
的代码以在两个数据帧中实现这一点吗?
你可以这样做:
- 创建
nn
来保存新名称
nn = dict.df %>%
filter(source %in% c("cre", "cre1")) %>%
mutate(newn = paste0(columnNames,"_",timing)) %>%
select(columnNames, newn)
- 更新dict.df
dict.df <-
dict.df %>%
mutate(columnNames = if_else(source %in% c("cre", "cre1"),paste0(columnNames,"_",timing), columnNames))
- 更新case.df
case.df <- case.df %>% rename_with(~nn$newn, .cols=nn$columnNames)
任务 1:我想根据第二个数据框中 2 列中的值向第一个数据框中的某些列添加后缀。 我的伪代码是:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select column names in case.df dataframe.
任务 2:接下来,我想再次基于 dict.df 中两列中的值向名为 columnNames 的第二个数据框列中的单元格添加相同的后缀。 我的伪代码是:
if (dict.df$source == 'cre' or dict.df$source == 'cre1') then append dict$timing to select cell contents in dict.df$columnName.
目前有这 2 个数据框需要重命名(因此在代码中硬编码列名是不切实际的):
agegen <- c('15m','34f','56m','49f','28m','37f')
race <- c('w','h','a','w','a','o')
eth <- c('-','h','-','-','h','-')
disp1 <- c('witness was violent', 'officer arrested suspect', 'never responded address', 'officer arrested suspect', 'suspect ran away', 'suspect ran away')
disp2<- c('new client', 'revisit to address', 'parent custody', 'county jail', 'drumset missing', 'new lockup')
disp3<- c('violent witness', 'bonded out', 'future court date', 'new client', 'weapon charge', 'girlfriend suspect')
disp4<- c('violent witness', 'knife in kitchen', 'suspect at precinct', 'new client', '3 people involved', 'girlfriend suspect')
case.df <- data.frame(agegen,race,eth,disp1,disp2,disp3,disp4)
case.df
# agegen race eth disp1 disp2 disp3 disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
columnNames <- c('agegen','race','eth','disp1','disp2','disp3','disp4')
timing <- c('0t','6m','0t','3t','3t','0t','0t')
source <- c('cre','cre','aft','cre1','aft','cre','aft')
dict.df <- data.frame(columnNames,timing,source)
dict.df
# columnNames timing source
# 1 agegen 0t cre
# 2 race 6m cre
# 3 eth 0t aft
# 4 disp1 3t cre1
# 5 disp2 3t aft
# 6 disp3 0t cre
# 7 disp4 0t aft
结果需要这 2 个数据框(不想重命名列名,除非“cre”或“cre1”出现在 dict.df$source 列的相应行中):
# agegen_0t race_6m eth disp1_3t disp2 disp3_0t disp4
# 1 15m w - witness was violent new client violent witness violent witness
# 2 34f h h officer arrested suspect revisit to address bonded out knife in kitchen
# 3 56m a - never responded address parent custody future court date suspect at precinct
# 4 49f w - officer arrested suspect county jail new client new client
# 5 28m a h suspect ran away drumset missing weapon charge 3 people involved
# 6 37f o - suspect ran away new lockup girlfriend suspect girlfriend suspect
# columnNames timing source
# 1 agegen_0t 0t cre
# 2 race_6m 6m cre
# 3 eth 0t aft
# 4 disp1_3t 3t cre1
# 5 disp2 3t aft
# 6 disp3_0t 0t cre
# 7 disp4 0t aft
更新
我在计时列中发现了一些“0”值(不是“0t”)。对于那些我想将后缀“_start”分配给 dict.df 中的列名,并添加“_start”作为 case.df 中相关列名称的后缀。基本上,类似于 if dict.df$timing == "0" then dict.df$columnNames <- "_start"
和 if dict.df$timing == "0" then all relevant columns in case.df get suffix <-"_start"
有什么想法可以调整 langtang
的代码以在两个数据帧中实现这一点吗?
你可以这样做:
- 创建
nn
来保存新名称
nn = dict.df %>%
filter(source %in% c("cre", "cre1")) %>%
mutate(newn = paste0(columnNames,"_",timing)) %>%
select(columnNames, newn)
- 更新dict.df
dict.df <-
dict.df %>%
mutate(columnNames = if_else(source %in% c("cre", "cre1"),paste0(columnNames,"_",timing), columnNames))
- 更新case.df
case.df <- case.df %>% rename_with(~nn$newn, .cols=nn$columnNames)