按 id 重塑和 activity 强度预测(总和)

Reshape by id and activity intensity prediction (sums)

我有一个按时间戳和 ID 组织的数据框。对于每个 ID# 和每一分钟,我有 8 列数据,每列有四种不同类型的 activity 强度预测。预测可以是久坐、轻度、适度或剧烈。数据按以下格式排列。

id  time    x1          x2     x3
1   10:30   Moderate    Light  Light
1   10:31   Moderate    Light  Moderate
...
2   12:24   Light       Light  Light
2   12:25   Light       Light  Light

我希望获得每个 ID 的每个预测变量(x1、x2、x3 等)的每个 activity 强度的总和。使用上面的示例,我希望重塑我的数据,使其看起来像这样:

id  Intensity   x1     x2     x3
1   Light       0      2      1
1   Moderate    2      0      1
...
2   Light       2       2     2
2   Moderate    0       0     0

我的文件有大约 80 个 ID 和 8 个 activity 强度预测列 (x1-x8),以防万一。

这是使用 tidyverse 套件套件的解决方案:

library(tidyverse)

values <- c("Sedentary", 'Light', 'Moderate', 'Vigorous')
df %>%
  mutate_at(vars(starts_with("x")), ~ factor(., levels = values)) %>%
  gather(key, value, - id, - time, factor_key = TRUE) %>%
  group_by(id, key, value) %>%
  summarize(
    n = n()
  ) %>%
  spread(key, n, fill = 0L, drop = FALSE)
library(tidyverse)

df %>%
  select(-time) %>%
  gather(key, intensity, -id) %>%
  group_by(id, intensity, key) %>%
  tally() %>%
  spread(key, n) %>%
  replace(is.na(.), 0)

输出为:

     id intensity    x1    x2    x3
1     1 Light         0     2     1
2     1 Moderate      3     0     2
3     1 Sedentary     1     0     1
4     1 Vigorous      0     2     0
5     2 Light         2     0     2
6     2 Moderate      1     1     0
7     2 Sedentary     0     2     0
8     2 Vigorous      0     0     1

示例数据:

df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), time = c("10:30", 
"10:31", "10:32", "10:33", "12:24", "12:25", "12:26"), x1 = c("Moderate", 
"Moderate", "Sedentary", "Moderate", "Light", "Moderate", "Light"
), x2 = c("Light", "Light", "Vigorous", "Vigorous", "Moderate", 
"Sedentary", "Sedentary"), x3 = c("Light", "Moderate", "Moderate", 
"Sedentary", "Light", "Light", "Vigorous")), class = "data.frame", row.names = c(NA, 
-7L))
#  id  time        x1        x2        x3
#1  1 10:30  Moderate     Light     Light
#2  1 10:31  Moderate     Light  Moderate
#3  1 10:32 Sedentary  Vigorous  Moderate
#4  1 10:33  Moderate  Vigorous Sedentary
#5  2 12:24     Light  Moderate     Light
#6  2 12:25  Moderate Sedentary     Light
#7  2 12:26     Light Sedentary  Vigorous

假设未使用变量 time,您可以这样做:

library(tidyverse)
library(data.table)

df %>%
  select(-time) %>% 
  data.table::melt("id") %>% 
  data.table::dcast(id+value~variable) %>% 
  rename(Intensity = value)