我如何过滤每组每天的第一个事件,select 根据条件在另一列中的变量,以及 运行 对中间值的计算?
How do I filter a first event per day per group, select a variable in another column based on a condition, and run calculations on values inbetween?
在 R 中,我有一个包含列 id(代表研究参与者)、阶段、时间、葡萄糖、步数和千卡路里的数据框。 id 和 phase 是 factors, time 是 POSIXcT 并包括日期 + 时间,glucose(每 ~15 分钟采样一次)steps(每分钟采样一次), 千卡路里(不规则采样,代表一顿饭)是数字。
葡萄糖和千卡路里数据的采样频率远低于步数,因此它包含大量 NA。
我想通过以下方式过滤此数据框:
- 检索每个参与者 (id) 当天第一顿饭的行,以及他们在那顿饭前 2 小时(+-15 分钟)的葡萄糖读数。
- 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及餐后 2 小时(+-15 分钟) 的葡萄糖读数.
- 从任务 2 开始,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。
我指定 2 小时 (+-15 分钟) 的原因是因为数据框具有葡萄糖读数的可能性非常低 =41=]饭后2小时,所以想延长时间
我已经尝试 如何根据时间和条件进行子集化,但无济于事,让我陷入了第一个任务。而且那个线程没有谈论我想要执行的复杂子集化。
编辑 - 这是一些符合任务标准的示例数据:
sampleData <- structure(list(id = c(13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13), phase = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), time = structure(c(1450881900,
1450881960, 1450882020, 1450882080, 1450882140, 1450882200, 1450882260,
1450882320, 1450882380, 1450882440, 1450882500, 1450882560, 1450882620,
1450882680, 1450882740, 1450882800, 1450882860, 1450882920, 1450882980,
1450883040, 1450883100, 1450883160, 1450883220, 1450883280, 1450883340,
1450883400, 1450883460, 1450883520, 1450883580, 1450883640, 1450883700,
1450883760, 1450883820, 1450883880, 1450883940, 1450884000, 1450884060,
1450884120, 1450884180, 1450884240, 1450884300, 1450884360, 1450884420,
1450884480, 1450884540, 1450884600, 1450884660, 1450884720, 1450884780,
1450884840, 1450884900, 1450884960, 1450885020, 1450885080, 1450885140,
1450885200, 1450885260, 1450885320, 1450885380, 1450885440, 1450885500,
1450885560, 1450885620, 1450885680, 1450885740, 1450885800, 1450885860,
1450885920, 1450885980, 1450886040, 1450886100, 1450886160, 1450886220,
1450886280, 1450886340, 1450886400, 1450886460, 1450886520, 1450886580,
1450886640, 1450886700, 1450886760, 1450886820, 1450886880, 1450886940,
1450887000, 1450887060, 1450887120, 1450887180, 1450887240, 1450887300,
1450887360, 1450887420, 1450887480, 1450887540, 1450887600, 1450887660,
1450887720, 1450887780, 1450887840, 1450887900, 1450887960, 1450888020,
1450888080, 1450888140, 1450888200, 1450888260, 1450888320, 1450888380,
1450888440, 1450888500, 1450888560, 1450888620, 1450888680, 1450888740,
1450888800, 1450888860, 1450888920, 1450888980, 1450889040, 1450889100,
1450889160, 1450889220, 1450889280, 1450889340, 1450889400, 1450889460,
1450889520, 1450889580, 1450889640, 1450889700, 1450889760, 1450889820,
1450889880, 1450889940, 1450890000, 1450890060, 1450890120, 1450890180,
1450890240, 1450890300, 1450890360, 1450890420, 1450890480, 1450890540,
1450890600, 1450890660, 1450890720, 1450890780, 1450890840, 1450890900,
1450890960, 1450891020, 1450891080, 1450891140, 1450891200, 1450891260,
1450891320, 1450891380, 1450891440, 1450891500, 1450891560, 1450891620,
1450891680, 1450891740, 1450891800, 1450891860, 1450891920, 1450891980,
1450892040, 1450892100, 1450892160, 1450892220, 1450892280, 1450892340,
1450892400, 1450892460, 1450892520, 1450892580, 1450892640, 1450892700,
1450892760, 1450892820, 1450892880, 1450892940, 1450893000, 1450893060,
1450893120, 1450893180, 1450893240, 1450893300, 1450893360, 1450893420,
1450893480, 1450893540, 1450893600, 1450893660, 1450893720, 1450893780,
1450893840, 1450893900, 1450893960, 1450894020, 1450894080, 1450894140,
1450894140, 1450894200, 1450894260, 1450894320, 1450894380, 1450894440,
1450894500, 1450894560, 1450894620, 1450894680, 1450894740, 1450894800,
1450894860, 1450894920, 1450894980, 1450895040, 1450895100, 1450895160,
1450895220, 1450895280, 1450895340, 1450895400, 1450895460, 1450895520,
1450895580, 1450895640, 1450895700, 1450895760, 1450895820, 1450895880,
1450895940, 1450896000, 1450896060, 1450896120, 1450896180, 1450896240,
1450896300, 1450896360, 1450896420, 1450896480, 1450896540, 1450896600,
1450896660, 1450896720, 1450896780, 1450896840, 1450896900, 1450896960,
1450897020, 1450897080, 1450897140, 1450897200, 1450897260, 1450897320,
1450897380, 1450897440, 1450897500, 1450897560, 1450897620, 1450897680,
1450897740, 1450897800, 1450897860, 1450897920, 1450897980, 1450898040,
1450898100, 1450898160, 1450898220, 1450898280, 1450898340, 1450898400,
1450898460, 1450898520, 1450898580, 1450898640, 1450898700, 1450898760,
1450898820, 1450898880, 1450898940, 1450899000, 1450899060, 1450899120,
1450899180, 1450899240, 1450899300, 1450899360, 1450899420, 1450899480,
1450899540, 1450899600, 1450899660, 1450899720, 1450899780, 1450899840,
1450899900), class = c("POSIXct", "POSIXt")), gl = c(NA, NA,
NA, NA, NA, NA, NA, NA, 84, NA, NA, NA, NA, 83, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 81, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 82, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 84, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 83, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 79, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
76, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 78,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 93, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 116, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 128, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 141, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 142, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 146,
143, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
136, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
129, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
139, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
128, NA, NA, NA, NA, NA, NA), steps = c(24, 39, 28, 19, 29, 6,
12, 3, 13, 1, 6, 2, 1, 13, 10, 1, 1, 1, 1, 0, 0, 1, 1, 3, 1,
0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 0, 3, 33, 27, 17, 27,
30, 19, 23, 34, 38, 25, 30, 42, 31, 31, 16, 52, 91, 39, 23, 7,
6, 27, 64, 20, 53, 22, 14, 14, 5, 4, 13, 7, 13, 7, 8, 10, 14,
26, 25, 19, 23, 35, 23, 15, 13, 12, 11, 27, 21, 25, 27, 4, 8,
18, 15, 22, 30, 16, 15, 15, 5, 3, 4, 6, 0, 12, 10, 4, 3, 5, 2,
5, 10, 13, 7, 2, 6, 2, 1, 15, 23, 25, 18, 27, 5, 11, 22, 31,
17, 27, 19, 2, 0, 12, 3, 0, 5, 5, 0, 0, 1, 0, 2, 2, 2, 5, 4,
4, 1, 7, 2, 5, 4, 8, 2, 4, 0, 4, 6, 8, 11, 10, 22, 2, 1, 0, 4,
4, 2, 2, 9, 19, 8, 11, 7, 7, 4, 0, 1, 0, 2, 3, 13, 9, 0, 3, 4,
5, 5, 7, 5, 5, 8, 8, 26, 23, 26, 27, 24, 24, 13, 25, 17, 24,
24, 11, 16, 15, 25, 21, 18, 11, 16, 19, 2, 0, 7, 6, 6, 3, 1,
13, 13, 0, 1, 10, 12, 10, 9, 7, 1, 1, 12, 4, 0, 0, 0, 5, 2, 5,
2, 1, 2, 0, 1, 2, 5, 11, 0, 0, 2, 1, 0, 2, 0, 7, 1, 0, 0, 0,
0, 1, 0, 3, 1, 0, 1, 0, 0, 3, 10, 13, 1, 8, 4, 1, 0, 0, 1, 0,
23, 22, 11, 16, 16, 5, 5, 5, 3, 14, 2, 0, 0, 0, 1, 2, 0, 1, 2,
3, 1), kiloCalories = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 603, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 143, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -302L), class = c("tbl_df",
"tbl", "data.frame"))
我相信对于您希望如何组织数据可能有很多考虑因素,具体取决于您打算如何进一步分析。不过,这里有一些想法可能对您有所帮助。
此解决方案使用 tidyverse
和 fuzzyjoin
,因为您使用 dplyr
标记 - 但您可能需要考虑 data.table
或 sqldf
解决方案作为替代方案,取决于数据大小、所需速度和其他因素。
首先,我会创建一个 table,其中包含基于未缺失的 kiloCalories
值的膳食。我们将创建一个 meal
列并枚举每个日期的膳食。此外,我们可以计算您的 windows 餐前和 post 餐前血糖水平。
library(tidyverse)
library(fuzzyjoin)
mealsData <- sampleData %>%
filter(!is.na(kiloCalories)) %>%
group_by(id, date = date(time)) %>%
mutate(meal = 1:n(),
preprandial_1 = time - (60 * 60 * 2) - (15 * 60),
preprandial_2 = time - (60 * 60 * 2) + (15 * 60),
postprandial_1 = time + (60 * 60 * 2) - (15 * 60),
postprandial_2 = time + (60 * 60 * 2) + (15 * 60)) %>%
select(-gl, -steps, -kiloCalories)
mealsData
的结果是这样的:
id phase time date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00
2 13 1 2015-12-23 13:30:00 2015-12-23 2 2015-12-23 11:15:00 2015-12-23 11:45:00 2015-12-23 15:15:00 2015-12-23 15:45:00
我发现这样的 table 非常有用,可以作为参考。
接下来,您可以将此 table 与您的 sampleData
合并。对于任务 1,您需要餐前第一餐葡萄糖水平。因此,您可以使用 fuzzy_join
并确保时间在确定的计算的餐前时间之间。
fuzzy_inner_join(
mealsData %>% filter(meal == 1),
sampleData %>% filter(!is.na(gl)),
by = c("id", "phase", "preprandial_1" = "time", "preprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
)
结果是:
id.x phase.x time.x date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2 id.y phase.y time.y gl steps kiloCalories
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm> <dbl> <dbl> <dttm> <dbl> <dbl> <dbl>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 09:53:00 84 13 NA
2 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 09:58:00 83 13 NA
3 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 10:08:00 81 3 NA
样本数据中似乎有 3 个葡萄糖水平落在 window 范围内。
接下来,您可以对post膳食数据做类似的事情,对于所有膳食:
fuzzy_inner_join(
mealsData,
sampleData %>% filter(!is.na(gl)),
by = c("id", "phase", "postprandial_1" = "time", "postprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
)
结果是:
id.x phase.x time.x date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2 id.y phase.y time.y gl steps kiloCalories
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm> <dbl> <dbl> <dttm> <dbl> <dbl> <dbl>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 13:54:00 134 0 NA
2 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 14:09:00 139 1 NA
这里有两个葡萄糖水平post找到了膳食。
最后,您可以合并 data.frames,然后按 id
(id.x
使用,因为连接创建了一个副本)、meal
和 date
。然后可以总结一下步骤:
fuzzy_inner_join(
mealsData,
sampleData,
by = c("id", "phase", "time" = "time", "postprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
) %>%
group_by(id.x, meal, date) %>%
summarise(step_sum = sum(steps))
结果是:
id.x meal date step_sum
<dbl> <int> <date> <dbl>
1 13 1 2015-12-23 876
2 13 2 2015-12-23 294
编辑 1:您也可以尝试使用 data.table
以获得更快的解决方案。使用 setDT
将使 data.frame 成为 data.table:
library(data.table)
setDT(mealsData)
setDT(sampleData)
然后,您可以在 sampleData
和 mealsData
之间进行非等值连接。此语句包括您要在结果中包含哪些列,并根据时间进行合并。 nomatch
将忽略没有匹配项的结果(例如,没有 post-第二餐的膳食葡萄糖水平)。
sampleData[!is.na(gl)][
mealsData,
.(id, phase, gl, x.time),
on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
nomatch = 0]
要获取步数总和,您可以尝试:
sampleData[mealsData,
.(id, phase, meal, date, steps),
on = .(id, phase, time >= time, time <= postprandial_2),
nomatch = 0][
,
.(step_sum = sum(steps)),
by = .(id, meal, date)]
结果应该和上面一样。
编辑 2:您可以合并第二个和第三个结果(平均血糖和步数总和)。确保两者都有 id
、phase
、meal
和 date
进行合并。第一个 dt1
现在包括平均葡萄糖并存储相关的 meal
。将 dt1
和 dt2
存储在中间 data.tables:
dt1 <- sampleData[!is.na(gl)][
mealsData,
.(id, phase, gl, x.time, meal, date),
on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
nomatch = 0][
,
.(gl_ave = mean(gl)),
by = .(id, phase, meal, date)]
dt2 <- sampleData[mealsData,
.(id, phase, meal, date, steps),
on = .(id, phase, time >= time, time <= postprandial_2),
nomatch = 0][
,
.(step_sum = sum(steps)),
by = .(id, phase, meal, date)]
然后 merge
:
merge(dt1, dt2, by = c("id", "phase", "meal", "date"))
由于您的数据框 sampleData
已排序并且每分钟包含一个观察值,因此您可以利用它:
library(dplyr)
library(zoo)
1) 检索每个参与者 (id) 当天第一餐的行,以及他们 2 小时(+-15 分钟)的血糖读数 在那顿饭之前:
sampleData$gl <- na.locf(sampleData$gl, na.rm=FALSE)
df1 <- sampleData %>%
mutate(previousGl = lag(gl,120), glTime = lag(time, 120)) %>%
filter(!is.na(kiloCalories))
2) 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及 2 小时(+-15 分钟)后的葡萄糖读数这顿饭。
sampleData$gl <- na.locf(sampleData$gl, fromLast = TRUE,na.rm=FALSE)
df2 <- sampleData %>%
mutate(previousGl = lag(gl,120), glTime = lead(time, 120)) %>%
filter(!is.na(kiloCalories))
3) 从任务 2 中,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。
lapply(1:NROW(df2), function(i) {
sampleData %>% filter(time >= df2$time[i],
time <= df2$glTime[i]) %>%
summarize(steps = sum(steps))
})
在 R 中,我有一个包含列 id(代表研究参与者)、阶段、时间、葡萄糖、步数和千卡路里的数据框。 id 和 phase 是 factors, time 是 POSIXcT 并包括日期 + 时间,glucose(每 ~15 分钟采样一次)steps(每分钟采样一次), 千卡路里(不规则采样,代表一顿饭)是数字。 葡萄糖和千卡路里数据的采样频率远低于步数,因此它包含大量 NA。
我想通过以下方式过滤此数据框:
- 检索每个参与者 (id) 当天第一顿饭的行,以及他们在那顿饭前 2 小时(+-15 分钟)的葡萄糖读数。
- 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及餐后 2 小时(+-15 分钟) 的葡萄糖读数.
- 从任务 2 开始,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。
我指定 2 小时 (+-15 分钟) 的原因是因为数据框具有葡萄糖读数的可能性非常低 =41=]饭后2小时,所以想延长时间
我已经尝试
编辑 - 这是一些符合任务标准的示例数据:
sampleData <- structure(list(id = c(13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13), phase = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), time = structure(c(1450881900,
1450881960, 1450882020, 1450882080, 1450882140, 1450882200, 1450882260,
1450882320, 1450882380, 1450882440, 1450882500, 1450882560, 1450882620,
1450882680, 1450882740, 1450882800, 1450882860, 1450882920, 1450882980,
1450883040, 1450883100, 1450883160, 1450883220, 1450883280, 1450883340,
1450883400, 1450883460, 1450883520, 1450883580, 1450883640, 1450883700,
1450883760, 1450883820, 1450883880, 1450883940, 1450884000, 1450884060,
1450884120, 1450884180, 1450884240, 1450884300, 1450884360, 1450884420,
1450884480, 1450884540, 1450884600, 1450884660, 1450884720, 1450884780,
1450884840, 1450884900, 1450884960, 1450885020, 1450885080, 1450885140,
1450885200, 1450885260, 1450885320, 1450885380, 1450885440, 1450885500,
1450885560, 1450885620, 1450885680, 1450885740, 1450885800, 1450885860,
1450885920, 1450885980, 1450886040, 1450886100, 1450886160, 1450886220,
1450886280, 1450886340, 1450886400, 1450886460, 1450886520, 1450886580,
1450886640, 1450886700, 1450886760, 1450886820, 1450886880, 1450886940,
1450887000, 1450887060, 1450887120, 1450887180, 1450887240, 1450887300,
1450887360, 1450887420, 1450887480, 1450887540, 1450887600, 1450887660,
1450887720, 1450887780, 1450887840, 1450887900, 1450887960, 1450888020,
1450888080, 1450888140, 1450888200, 1450888260, 1450888320, 1450888380,
1450888440, 1450888500, 1450888560, 1450888620, 1450888680, 1450888740,
1450888800, 1450888860, 1450888920, 1450888980, 1450889040, 1450889100,
1450889160, 1450889220, 1450889280, 1450889340, 1450889400, 1450889460,
1450889520, 1450889580, 1450889640, 1450889700, 1450889760, 1450889820,
1450889880, 1450889940, 1450890000, 1450890060, 1450890120, 1450890180,
1450890240, 1450890300, 1450890360, 1450890420, 1450890480, 1450890540,
1450890600, 1450890660, 1450890720, 1450890780, 1450890840, 1450890900,
1450890960, 1450891020, 1450891080, 1450891140, 1450891200, 1450891260,
1450891320, 1450891380, 1450891440, 1450891500, 1450891560, 1450891620,
1450891680, 1450891740, 1450891800, 1450891860, 1450891920, 1450891980,
1450892040, 1450892100, 1450892160, 1450892220, 1450892280, 1450892340,
1450892400, 1450892460, 1450892520, 1450892580, 1450892640, 1450892700,
1450892760, 1450892820, 1450892880, 1450892940, 1450893000, 1450893060,
1450893120, 1450893180, 1450893240, 1450893300, 1450893360, 1450893420,
1450893480, 1450893540, 1450893600, 1450893660, 1450893720, 1450893780,
1450893840, 1450893900, 1450893960, 1450894020, 1450894080, 1450894140,
1450894140, 1450894200, 1450894260, 1450894320, 1450894380, 1450894440,
1450894500, 1450894560, 1450894620, 1450894680, 1450894740, 1450894800,
1450894860, 1450894920, 1450894980, 1450895040, 1450895100, 1450895160,
1450895220, 1450895280, 1450895340, 1450895400, 1450895460, 1450895520,
1450895580, 1450895640, 1450895700, 1450895760, 1450895820, 1450895880,
1450895940, 1450896000, 1450896060, 1450896120, 1450896180, 1450896240,
1450896300, 1450896360, 1450896420, 1450896480, 1450896540, 1450896600,
1450896660, 1450896720, 1450896780, 1450896840, 1450896900, 1450896960,
1450897020, 1450897080, 1450897140, 1450897200, 1450897260, 1450897320,
1450897380, 1450897440, 1450897500, 1450897560, 1450897620, 1450897680,
1450897740, 1450897800, 1450897860, 1450897920, 1450897980, 1450898040,
1450898100, 1450898160, 1450898220, 1450898280, 1450898340, 1450898400,
1450898460, 1450898520, 1450898580, 1450898640, 1450898700, 1450898760,
1450898820, 1450898880, 1450898940, 1450899000, 1450899060, 1450899120,
1450899180, 1450899240, 1450899300, 1450899360, 1450899420, 1450899480,
1450899540, 1450899600, 1450899660, 1450899720, 1450899780, 1450899840,
1450899900), class = c("POSIXct", "POSIXt")), gl = c(NA, NA,
NA, NA, NA, NA, NA, NA, 84, NA, NA, NA, NA, 83, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 81, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 82, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 84, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 83, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 79, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
76, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 78,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 93, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 116, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 128, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 141, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 142, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 146,
143, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
136, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
129, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
139, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
134, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
128, NA, NA, NA, NA, NA, NA), steps = c(24, 39, 28, 19, 29, 6,
12, 3, 13, 1, 6, 2, 1, 13, 10, 1, 1, 1, 1, 0, 0, 1, 1, 3, 1,
0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 0, 3, 33, 27, 17, 27,
30, 19, 23, 34, 38, 25, 30, 42, 31, 31, 16, 52, 91, 39, 23, 7,
6, 27, 64, 20, 53, 22, 14, 14, 5, 4, 13, 7, 13, 7, 8, 10, 14,
26, 25, 19, 23, 35, 23, 15, 13, 12, 11, 27, 21, 25, 27, 4, 8,
18, 15, 22, 30, 16, 15, 15, 5, 3, 4, 6, 0, 12, 10, 4, 3, 5, 2,
5, 10, 13, 7, 2, 6, 2, 1, 15, 23, 25, 18, 27, 5, 11, 22, 31,
17, 27, 19, 2, 0, 12, 3, 0, 5, 5, 0, 0, 1, 0, 2, 2, 2, 5, 4,
4, 1, 7, 2, 5, 4, 8, 2, 4, 0, 4, 6, 8, 11, 10, 22, 2, 1, 0, 4,
4, 2, 2, 9, 19, 8, 11, 7, 7, 4, 0, 1, 0, 2, 3, 13, 9, 0, 3, 4,
5, 5, 7, 5, 5, 8, 8, 26, 23, 26, 27, 24, 24, 13, 25, 17, 24,
24, 11, 16, 15, 25, 21, 18, 11, 16, 19, 2, 0, 7, 6, 6, 3, 1,
13, 13, 0, 1, 10, 12, 10, 9, 7, 1, 1, 12, 4, 0, 0, 0, 5, 2, 5,
2, 1, 2, 0, 1, 2, 5, 11, 0, 0, 2, 1, 0, 2, 0, 7, 1, 0, 0, 0,
0, 1, 0, 3, 1, 0, 1, 0, 0, 3, 10, 13, 1, 8, 4, 1, 0, 0, 1, 0,
23, 22, 11, 16, 16, 5, 5, 5, 3, 14, 2, 0, 0, 0, 1, 2, 0, 1, 2,
3, 1), kiloCalories = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 603, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 143, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -302L), class = c("tbl_df",
"tbl", "data.frame"))
我相信对于您希望如何组织数据可能有很多考虑因素,具体取决于您打算如何进一步分析。不过,这里有一些想法可能对您有所帮助。
此解决方案使用 tidyverse
和 fuzzyjoin
,因为您使用 dplyr
标记 - 但您可能需要考虑 data.table
或 sqldf
解决方案作为替代方案,取决于数据大小、所需速度和其他因素。
首先,我会创建一个 table,其中包含基于未缺失的 kiloCalories
值的膳食。我们将创建一个 meal
列并枚举每个日期的膳食。此外,我们可以计算您的 windows 餐前和 post 餐前血糖水平。
library(tidyverse)
library(fuzzyjoin)
mealsData <- sampleData %>%
filter(!is.na(kiloCalories)) %>%
group_by(id, date = date(time)) %>%
mutate(meal = 1:n(),
preprandial_1 = time - (60 * 60 * 2) - (15 * 60),
preprandial_2 = time - (60 * 60 * 2) + (15 * 60),
postprandial_1 = time + (60 * 60 * 2) - (15 * 60),
postprandial_2 = time + (60 * 60 * 2) + (15 * 60)) %>%
select(-gl, -steps, -kiloCalories)
mealsData
的结果是这样的:
id phase time date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00
2 13 1 2015-12-23 13:30:00 2015-12-23 2 2015-12-23 11:15:00 2015-12-23 11:45:00 2015-12-23 15:15:00 2015-12-23 15:45:00
我发现这样的 table 非常有用,可以作为参考。
接下来,您可以将此 table 与您的 sampleData
合并。对于任务 1,您需要餐前第一餐葡萄糖水平。因此,您可以使用 fuzzy_join
并确保时间在确定的计算的餐前时间之间。
fuzzy_inner_join(
mealsData %>% filter(meal == 1),
sampleData %>% filter(!is.na(gl)),
by = c("id", "phase", "preprandial_1" = "time", "preprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
)
结果是:
id.x phase.x time.x date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2 id.y phase.y time.y gl steps kiloCalories
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm> <dbl> <dbl> <dttm> <dbl> <dbl> <dbl>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 09:53:00 84 13 NA
2 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 09:58:00 83 13 NA
3 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 10:08:00 81 3 NA
样本数据中似乎有 3 个葡萄糖水平落在 window 范围内。
接下来,您可以对post膳食数据做类似的事情,对于所有膳食:
fuzzy_inner_join(
mealsData,
sampleData %>% filter(!is.na(gl)),
by = c("id", "phase", "postprandial_1" = "time", "postprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
)
结果是:
id.x phase.x time.x date meal preprandial_1 preprandial_2 postprandial_1 postprandial_2 id.y phase.y time.y gl steps kiloCalories
<dbl> <dbl> <dttm> <date> <int> <dttm> <dttm> <dttm> <dttm> <dbl> <dbl> <dttm> <dbl> <dbl> <dbl>
1 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 13:54:00 134 0 NA
2 13 1 2015-12-23 12:00:00 2015-12-23 1 2015-12-23 09:45:00 2015-12-23 10:15:00 2015-12-23 13:45:00 2015-12-23 14:15:00 13 1 2015-12-23 14:09:00 139 1 NA
这里有两个葡萄糖水平post找到了膳食。
最后,您可以合并 data.frames,然后按 id
(id.x
使用,因为连接创建了一个副本)、meal
和 date
。然后可以总结一下步骤:
fuzzy_inner_join(
mealsData,
sampleData,
by = c("id", "phase", "time" = "time", "postprandial_2" = "time"),
match_fun = c(`==`, `==`, `<=`, `>=`)
) %>%
group_by(id.x, meal, date) %>%
summarise(step_sum = sum(steps))
结果是:
id.x meal date step_sum
<dbl> <int> <date> <dbl>
1 13 1 2015-12-23 876
2 13 2 2015-12-23 294
编辑 1:您也可以尝试使用 data.table
以获得更快的解决方案。使用 setDT
将使 data.frame 成为 data.table:
library(data.table)
setDT(mealsData)
setDT(sampleData)
然后,您可以在 sampleData
和 mealsData
之间进行非等值连接。此语句包括您要在结果中包含哪些列,并根据时间进行合并。 nomatch
将忽略没有匹配项的结果(例如,没有 post-第二餐的膳食葡萄糖水平)。
sampleData[!is.na(gl)][
mealsData,
.(id, phase, gl, x.time),
on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
nomatch = 0]
要获取步数总和,您可以尝试:
sampleData[mealsData,
.(id, phase, meal, date, steps),
on = .(id, phase, time >= time, time <= postprandial_2),
nomatch = 0][
,
.(step_sum = sum(steps)),
by = .(id, meal, date)]
结果应该和上面一样。
编辑 2:您可以合并第二个和第三个结果(平均血糖和步数总和)。确保两者都有 id
、phase
、meal
和 date
进行合并。第一个 dt1
现在包括平均葡萄糖并存储相关的 meal
。将 dt1
和 dt2
存储在中间 data.tables:
dt1 <- sampleData[!is.na(gl)][
mealsData,
.(id, phase, gl, x.time, meal, date),
on = .(id, phase, time >= postprandial_1, time <= postprandial_2),
nomatch = 0][
,
.(gl_ave = mean(gl)),
by = .(id, phase, meal, date)]
dt2 <- sampleData[mealsData,
.(id, phase, meal, date, steps),
on = .(id, phase, time >= time, time <= postprandial_2),
nomatch = 0][
,
.(step_sum = sum(steps)),
by = .(id, phase, meal, date)]
然后 merge
:
merge(dt1, dt2, by = c("id", "phase", "meal", "date"))
由于您的数据框 sampleData
已排序并且每分钟包含一个观察值,因此您可以利用它:
library(dplyr)
library(zoo)
1) 检索每个参与者 (id) 当天第一餐的行,以及他们 2 小时(+-15 分钟)的血糖读数 在那顿饭之前:
sampleData$gl <- na.locf(sampleData$gl, na.rm=FALSE)
df1 <- sampleData %>%
mutate(previousGl = lag(gl,120), glTime = lag(time, 120)) %>%
filter(!is.na(kiloCalories))
2) 检索每个参与者 (id) 每餐(即每千卡路里条目)的行,以及 2 小时(+-15 分钟)后的葡萄糖读数这顿饭。
sampleData$gl <- na.locf(sampleData$gl, fromLast = TRUE,na.rm=FALSE)
df2 <- sampleData %>%
mutate(previousGl = lag(gl,120), glTime = lead(time, 120)) %>%
filter(!is.na(kiloCalories))
3) 从任务 2 中,获取膳食和葡萄糖读数之间的数据子集,并计算该时间内的步数总和。
lapply(1:NROW(df2), function(i) {
sampleData %>% filter(time >= df2$time[i],
time <= df2$glTime[i]) %>%
summarize(steps = sum(steps))
})