通过合并变量中的行来重新组织数据库
Re-organize database by merging rows from a variable
我有一个如下所示的数据库:
userId SessionId Screen Platform Version
01 1 first IOS 1.0.1
01 1 main IOS 1.0.1
01 2 first IOS 1.0.1
01 3 first IOS 1.0.1
01 3 main IOS 1.0.1
01 3 detail IOS 1.0.1
02 1 first Android 1.0.2
基本上我打算做的是确定 "path"(不同的屏幕)是否会导致更好的保留。我想在一列中重新组织每个 sessionId。理想的数据库如下所示:
userId SessionId Path Retention
01 1 first;main 3
01 2 first 3
01 3 first;main;detail 3
02 1 first 1
此处变量 Retention
等于最大值 SessionId
。
我有一个data.table
解决方案
library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]
userId SessionId Screen Retention
1: 01 1 first;main 3
2: 01 2 first 3
3: 01 3 first;main;detail 3
4: 02 1 first 1
基于 R 的可能解决方案:
d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))
给出:
> d2
userId SessionId Screen retention
1 01 1 first, main 3
2 02 1 first 1
3 01 2 first 3
4 01 3 first, main, detail 3
使用 dplyr
的替代方法:
library(dplyr)
d %>%
group_by(userId, SessionId) %>%
summarise(Screen = toString(Screen)) %>%
group_by(userId) %>%
mutate(retention = n())
给出:
userId SessionId Screen retention
<chr> <int> <chr> <int>
1 01 1 first, main 3
2 01 2 first 3
3 01 3 first, main, detail 3
4 02 1 first 1
我有一个如下所示的数据库:
userId SessionId Screen Platform Version
01 1 first IOS 1.0.1
01 1 main IOS 1.0.1
01 2 first IOS 1.0.1
01 3 first IOS 1.0.1
01 3 main IOS 1.0.1
01 3 detail IOS 1.0.1
02 1 first Android 1.0.2
基本上我打算做的是确定 "path"(不同的屏幕)是否会导致更好的保留。我想在一列中重新组织每个 sessionId。理想的数据库如下所示:
userId SessionId Path Retention
01 1 first;main 3
01 2 first 3
01 3 first;main;detail 3
02 1 first 1
此处变量 Retention
等于最大值 SessionId
。
我有一个data.table
解决方案
library(data.table)
dt <- as.data.table(d)
dt[, Retention := max(SessionId), by = .(userId)]
dt[, .(Screen = paste(Screen, collapse = ";"), Retention = unique(Retention)), by = .(userId, SessionId)]
userId SessionId Screen Retention
1: 01 1 first;main 3
2: 01 2 first 3
3: 01 3 first;main;detail 3
4: 02 1 first 1
基于 R 的可能解决方案:
d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))
给出:
> d2
userId SessionId Screen retention
1 01 1 first, main 3
2 02 1 first 1
3 01 2 first 3
4 01 3 first, main, detail 3
使用 dplyr
的替代方法:
library(dplyr)
d %>%
group_by(userId, SessionId) %>%
summarise(Screen = toString(Screen)) %>%
group_by(userId) %>%
mutate(retention = n())
给出:
userId SessionId Screen retention
<chr> <int> <chr> <int>
1 01 1 first, main 3
2 01 2 first 3
3 01 3 first, main, detail 3
4 02 1 first 1