通过合并变量中的行来重新组织数据库

Re-organize database by merging rows from a variable

我有一个如下所示的数据库:

userId          SessionId        Screen         Platform       Version
01              1                first          IOS            1.0.1
01              1                main           IOS            1.0.1
01              2                first          IOS            1.0.1
01              3                first          IOS            1.0.1
01              3                main           IOS            1.0.1
01              3                detail         IOS            1.0.1
02              1                first          Android        1.0.2

基本上我打算做的是确定 "path"(不同的屏幕)是否会导致更好的保留。我想在一列中重新组织每个 sessionId。理想的数据库如下所示:

userId       SessionId       Path                 Retention
01           1               first;main           3
01           2               first                3
01           3               first;main;detail    3
02           1               first                1

此处变量 Retention 等于最大值 SessionId

我有一个data.table解决方案

library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]

userId SessionId            Screen Retention
1:     01         1        first;main         3
2:     01         2             first         3
3:     01         3 first;main;detail         3
4:     02         1             first         1

基于 R 的可能解决方案:

d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))

给出:

> d2
  userId SessionId              Screen retention
1     01         1         first, main         3
2     02         1               first         1
3     01         2               first         3
4     01         3 first, main, detail         3

使用 dplyr 的替代方法:

library(dplyr)
d %>% 
  group_by(userId, SessionId) %>% 
  summarise(Screen = toString(Screen)) %>% 
  group_by(userId) %>% 
  mutate(retention = n())

给出:

  userId SessionId              Screen retention
   <chr>     <int>               <chr>     <int>
1     01         1         first, main         3
2     01         2               first         3
3     01         3 first, main, detail         3
4     02         1               first         1