在R中按组组合行和列的字符变量
Combine character variable over rows and columns by group in R
我是 R 的初学者,我正在尝试解决 R 中的问题,我想这对于有经验的用户来说很容易。
问题如下:客户(A、B、C)使用不同的程序 (Prg) 反复进来。我想确定 “典型序列” 程序。因此,我确定第一个程序,它们消耗,第二个,和第三个。在下一步中,我想将这些信息组合到客户的 程序序列 中。对于首先消费 Prg1、然后是 Prg2、然后是 Prg3 的客户,最终结果应该是“Prg1-Prg2-Prg3”。
下面的代码生成了一个类似于我的数据框。 Prg是相应年份的Programm,First是客户进入的第一年,Sec是第二年,Third是第三年。
代码生成的列提取第一个合约 (Code_1_Prg)、第二个合约 (Code_2_Prg) 和第三个合约 (Code_3_Prg) 中使用的程序。
不幸的是,我没有成功地将这 3 列组合到所需的目标。我尝试按 ID 分组并将序列的第一个元素保存在名为“chain1”的新列中。在这里我收到错误信息“Error in df %>% group_by(ID) %>% df$chain1 = df[df$Code_1_Prg != "NA", :
找不到函数“%>%<-”,即使我使用的是 magrittr 和 dplyr 包。
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
下面分享一些代码,生成dataframe,以及分组提取Code_1_Prg中字符变量的起点。
如果你能帮我解决这个问题,我将不胜感激。非常感谢您!
df <- data.frame("ID"=c("A","A","A","A","B", "B", "B","B","B","C","C", "C", "C","C","C","C"),
"Year_Contract" =c("2010", "2015", "2017","2017","2010","2010", "2015","2015","2020","2015","2015","2017","2017","2017","2018","2018"),
"Prg"=c("AIB","AIB","LLA","LLA","BBU","BBU", "KLU","KLU","DDI","CKN","CKN","BBU","BBU","BBU","KLU","KLU"),
"First"=c("2010","2010","2010","2010","2010","2010", "2010","2010","2010","2015","2015","2015","2015","2015","2015","2015"),
"Sec"=c("2015","2015","2015","2015","2015","2015", "2015","2015","2015","2017","2017","2017","2017","2017","2017","2017"),
"Third"=c("2017","2017","2017","2017","2020","2020", "2020","2020","2020","2018","2018","2018","2018","2018","2018","2018")
)
df$Code_1_Prg <- ifelse(df$Year_Contract == df$First, df$Code_1_Prg <- df$Prg, NA)
df$Code_2_Prg <- ifelse(df$Year_Contract == df$Sec, df$Code_2_Prg <- df$Prg, NA)
df$Code_3_Prg <- ifelse(df$Year_Contract == df$Third, df$Code_3_Prg <- df$Prg, NA)
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
#This is the final column, I am trying to create
df2 <- data.frame("ID"=c("A","B", "C"),
"Goal" =c("AIB-LLA", "BBU-KLU-DDI", "CKN-BBU-KLU")
)
df <- merge(df, df2, by="ID")
您在找这样的东西吗?
libra4ry(dplyr)
df %>%
group_by(ID) %>%
arrange(Year_Contract, .by_group = TRUE) %>%
distinct() %>%
summarise(sequence = toString(Prg))
ID sequence
<chr> <chr>
1 A AIB, AIB, LLA
2 B BBU, KLU, DDI
3 C CKN, BBU, KLU
我是 R 的初学者,我正在尝试解决 R 中的问题,我想这对于有经验的用户来说很容易。
问题如下:客户(A、B、C)使用不同的程序 (Prg) 反复进来。我想确定 “典型序列” 程序。因此,我确定第一个程序,它们消耗,第二个,和第三个。在下一步中,我想将这些信息组合到客户的 程序序列 中。对于首先消费 Prg1、然后是 Prg2、然后是 Prg3 的客户,最终结果应该是“Prg1-Prg2-Prg3”。
下面的代码生成了一个类似于我的数据框。 Prg是相应年份的Programm,First是客户进入的第一年,Sec是第二年,Third是第三年。
代码生成的列提取第一个合约 (Code_1_Prg)、第二个合约 (Code_2_Prg) 和第三个合约 (Code_3_Prg) 中使用的程序。
不幸的是,我没有成功地将这 3 列组合到所需的目标。我尝试按 ID 分组并将序列的第一个元素保存在名为“chain1”的新列中。在这里我收到错误信息“Error in df %>% group_by(ID) %>% df$chain1 = df[df$Code_1_Prg != "NA", : 找不到函数“%>%<-”,即使我使用的是 magrittr 和 dplyr 包。
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
下面分享一些代码,生成dataframe,以及分组提取Code_1_Prg中字符变量的起点。
如果你能帮我解决这个问题,我将不胜感激。非常感谢您!
df <- data.frame("ID"=c("A","A","A","A","B", "B", "B","B","B","C","C", "C", "C","C","C","C"),
"Year_Contract" =c("2010", "2015", "2017","2017","2010","2010", "2015","2015","2020","2015","2015","2017","2017","2017","2018","2018"),
"Prg"=c("AIB","AIB","LLA","LLA","BBU","BBU", "KLU","KLU","DDI","CKN","CKN","BBU","BBU","BBU","KLU","KLU"),
"First"=c("2010","2010","2010","2010","2010","2010", "2010","2010","2010","2015","2015","2015","2015","2015","2015","2015"),
"Sec"=c("2015","2015","2015","2015","2015","2015", "2015","2015","2015","2017","2017","2017","2017","2017","2017","2017"),
"Third"=c("2017","2017","2017","2017","2020","2020", "2020","2020","2020","2018","2018","2018","2018","2018","2018","2018")
)
df$Code_1_Prg <- ifelse(df$Year_Contract == df$First, df$Code_1_Prg <- df$Prg, NA)
df$Code_2_Prg <- ifelse(df$Year_Contract == df$Sec, df$Code_2_Prg <- df$Prg, NA)
df$Code_3_Prg <- ifelse(df$Year_Contract == df$Third, df$Code_3_Prg <- df$Prg, NA)
detach(package:plyr)
library(dplyr)
library(magrittr)
df %>%
group_by(ID) %>%
df$chain1 = df[df$Code_1_Prg!="NA", "Code_1_Prg"]
#This is the final column, I am trying to create
df2 <- data.frame("ID"=c("A","B", "C"),
"Goal" =c("AIB-LLA", "BBU-KLU-DDI", "CKN-BBU-KLU")
)
df <- merge(df, df2, by="ID")
您在找这样的东西吗?
libra4ry(dplyr)
df %>%
group_by(ID) %>%
arrange(Year_Contract, .by_group = TRUE) %>%
distinct() %>%
summarise(sequence = toString(Prg))
ID sequence
<chr> <chr>
1 A AIB, AIB, LLA
2 B BBU, KLU, DDI
3 C CKN, BBU, KLU