操纵数据集以考虑重复测量
Manipulating dataset to account for repeated measures
鉴于:
df <- data.frame(
CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3))
我需要弄清楚怎么去:
df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders",
"Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10",
"1","2","3","4","5","6","7","8","9","10"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA))
说明:
我有我在 10 天内每天对人们进行一次调查的数据。在一个完美的世界中,每个参与者都会收到 10 条回复,用 day1:day10 表示。然而,由于没有回应,一些参与者给出了 3 个回应,其他人 6 个,其他人 10 个等等。我正在将数据设置为 运行 增长模型,所以我需要“天”列始终阅读第 1 天到第 10 天,无论是否有这些响应的数据。我试图通过将 NA 添加到没有全部 10 天数据的行来证明这一点。
首先,创建唯一公司 ID 的数据框。
接下来,创建所需天数的数据框。
交叉将这些连接在一起。
然后加入您的原始数据集以填写 table。
comp <- data.frame(CompanyID = unique(df$CompanyID))
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10"))
compDay <- merge(comp, Day, all = TRUE)
dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE)
试试这个:
library(tidyr)
df %>%
complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>%
data.frame()
输出:
CompanyID Email Day var1
1 Drinkers john@coffee.com 1 4
2 Drinkers john@coffee.com 2 5
3 Drinkers john@coffee.com 3 5
4 Drinkers john@coffee.com 4 5
5 Drinkers john@coffee.com 5 5
6 Drinkers john@coffee.com 6 2
7 Drinkers john@coffee.com 7 3
8 Drinkers john@coffee.com 8 2
9 Drinkers john@coffee.com 9 7
10 Drinkers john@coffee.com 10 6
11 Liquders george@liquid.com 1 7
12 Liquders george@liquid.com 2 NA
13 Liquders george@liquid.com 3 6
14 Liquders george@liquid.com 4 6
15 Liquders george@liquid.com 5 NA
16 Liquders george@liquid.com 6 NA
17 Liquders george@liquid.com 7 NA
18 Liquders george@liquid.com 8 NA
19 Liquders george@liquid.com 9 NA
20 Liquders george@liquid.com 10 NA
21 PelletCoffeeCo stacy@pelletcoffee.com 1 2
22 PelletCoffeeCo stacy@pelletcoffee.com 2 NA
23 PelletCoffeeCo stacy@pelletcoffee.com 3 3
24 PelletCoffeeCo stacy@pelletcoffee.com 4 NA
25 PelletCoffeeCo stacy@pelletcoffee.com 5 NA
26 PelletCoffeeCo stacy@pelletcoffee.com 6 NA
27 PelletCoffeeCo stacy@pelletcoffee.com 7 NA
28 PelletCoffeeCo stacy@pelletcoffee.com 8 NA
29 PelletCoffeeCo stacy@pelletcoffee.com 9 NA
30 PelletCoffeeCo stacy@pelletcoffee.com 10 NA
编辑:
上面的代码用一组完整的日期值填充每个组的日期列值,这些日期值由该列中现有值的最小值和最大值(分别为 1 和 10)定义。可以根据需要重新定义填充这些 Day 值的组,但我在这里选择将它们定义为 Company + Email,行 "nesting(CompanyID,Email)"。 data.frame() 行只是将输出转换为 data.frame 而不是小标题。如果不需要 data.frame 输出,请随意替换或删除该行。
鉴于:
df <- data.frame(
CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3))
我需要弄清楚怎么去:
df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders",
"Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10",
"1","2","3","4","5","6","7","8","9","10"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA))
说明: 我有我在 10 天内每天对人们进行一次调查的数据。在一个完美的世界中,每个参与者都会收到 10 条回复,用 day1:day10 表示。然而,由于没有回应,一些参与者给出了 3 个回应,其他人 6 个,其他人 10 个等等。我正在将数据设置为 运行 增长模型,所以我需要“天”列始终阅读第 1 天到第 10 天,无论是否有这些响应的数据。我试图通过将 NA 添加到没有全部 10 天数据的行来证明这一点。
首先,创建唯一公司 ID 的数据框。 接下来,创建所需天数的数据框。
交叉将这些连接在一起。
然后加入您的原始数据集以填写 table。
comp <- data.frame(CompanyID = unique(df$CompanyID))
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10"))
compDay <- merge(comp, Day, all = TRUE)
dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE)
试试这个:
library(tidyr)
df %>%
complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>%
data.frame()
输出:
CompanyID Email Day var1
1 Drinkers john@coffee.com 1 4
2 Drinkers john@coffee.com 2 5
3 Drinkers john@coffee.com 3 5
4 Drinkers john@coffee.com 4 5
5 Drinkers john@coffee.com 5 5
6 Drinkers john@coffee.com 6 2
7 Drinkers john@coffee.com 7 3
8 Drinkers john@coffee.com 8 2
9 Drinkers john@coffee.com 9 7
10 Drinkers john@coffee.com 10 6
11 Liquders george@liquid.com 1 7
12 Liquders george@liquid.com 2 NA
13 Liquders george@liquid.com 3 6
14 Liquders george@liquid.com 4 6
15 Liquders george@liquid.com 5 NA
16 Liquders george@liquid.com 6 NA
17 Liquders george@liquid.com 7 NA
18 Liquders george@liquid.com 8 NA
19 Liquders george@liquid.com 9 NA
20 Liquders george@liquid.com 10 NA
21 PelletCoffeeCo stacy@pelletcoffee.com 1 2
22 PelletCoffeeCo stacy@pelletcoffee.com 2 NA
23 PelletCoffeeCo stacy@pelletcoffee.com 3 3
24 PelletCoffeeCo stacy@pelletcoffee.com 4 NA
25 PelletCoffeeCo stacy@pelletcoffee.com 5 NA
26 PelletCoffeeCo stacy@pelletcoffee.com 6 NA
27 PelletCoffeeCo stacy@pelletcoffee.com 7 NA
28 PelletCoffeeCo stacy@pelletcoffee.com 8 NA
29 PelletCoffeeCo stacy@pelletcoffee.com 9 NA
30 PelletCoffeeCo stacy@pelletcoffee.com 10 NA
编辑:
上面的代码用一组完整的日期值填充每个组的日期列值,这些日期值由该列中现有值的最小值和最大值(分别为 1 和 10)定义。可以根据需要重新定义填充这些 Day 值的组,但我在这里选择将它们定义为 Company + Email,行 "nesting(CompanyID,Email)"。 data.frame() 行只是将输出转换为 data.frame 而不是小标题。如果不需要 data.frame 输出,请随意替换或删除该行。