在R中的数据表中查找字符串的第一次迭代
Finding first iteration of a string in a datatable in R
我是 R 的新手,所以我想弄清楚如何才能做得更好。我有一个数据 table,它包含两列(Day 和 Sleepstatus)。我如何根据列 day 找到睡眠和清醒的第一次迭代,并改变另一列以指示人何时开始睡眠(第一行睡眠)和停止睡眠(第一行清醒)。剩余的睡眠时长,该列应显示 N.A.
Day
SleepStatus
1
Sleeping
1
Sleeping
1
Sleeping
1
Awake
2
Sleeping
2
Sleeping
2
Sleeping
2
Awake
期望的输出
Day
SleepStatus
Final Status
1
Sleeping
Start Sleep
1
Sleeping
NA
1
Sleeping
Stop Sleep
1
Awake
NA
2
Sleeping
Start Sleep
2
Sleeping
NA
2
Sleeping
Stop Sleep
2
Awake
NA
这是一种可能的解决方案:
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
dt[, `Final Status` := {ifelse(
cumsum(SleepStatus != "Sleeping") != shift(cumsum(SleepStatus != "Sleeping"), fill = 0, type = "lag"),
"Stop Sleep", "Start Sleep")}]
dt[, `Final Status` := {ifelse(
`Final Status` == shift(`Final Status`, fill = "NA", type = "lag"),
NA, `Final Status`)}]
dt
#> Day SleepStatus Final Status
#> 1: 1 Sleeping Start Sleep
#> 2: 1 Sleeping <NA>
#> 3: 1 Sleeping <NA>
#> 4: 1 Awake Stop Sleep
#> 5: 2 Sleeping Start Sleep
#> 6: 2 Sleeping <NA>
#> 7: 2 Sleeping <NA>
#> 8: 2 Awake Stop Sleep
如果将代码分解成更小的块,代码会更有意义。我已经使用下面的 tidyverse 函数完成了此操作,因为我觉得它更容易理解,但如果您愿意,我可以将其更改为 data.table 语法。
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
library(tidyverse)
df <- as.data.frame(dt)
# When the Sleepstatus is not "Sleeping", increment the variable by one
df2 <- df %>%
mutate(Sleeping = cumsum(SleepStatus != "Sleeping"))
df2
#> Day SleepStatus Sleeping
#> 1 1 Sleeping 0
#> 2 1 Sleeping 0
#> 3 1 Sleeping 0
#> 4 1 Awake 1
#> 5 2 Sleeping 1
#> 6 2 Sleeping 1
#> 7 2 Sleeping 1
#> 8 2 Awake 2
# If the previous value in "Sleeping" is different to the current value,
# add the "stop sleeping" flag (i.e. show when "Sleeping" changes)
df3 <- df2 %>%
mutate(Sleep_label = ifelse(Sleeping != lag(Sleeping, default = 0), "Stop sleeping", "Start sleeping"))
df3
#> Day SleepStatus Sleeping Sleep_label
#> 1 1 Sleeping 0 Start sleeping
#> 2 1 Sleeping 0 Start sleeping
#> 3 1 Sleeping 0 Start sleeping
#> 4 1 Awake 1 Stop sleeping
#> 5 2 Sleeping 1 Start sleeping
#> 6 2 Sleeping 1 Start sleeping
#> 7 2 Sleeping 1 Start sleeping
#> 8 2 Awake 2 Stop sleeping
# Then, if the value in Sleep_label is equal to the previous label,
# change it to NA
df4 <- df3 %>%
mutate(Final_status = ifelse(Sleep_label == lag(Sleep_label, default = "NA"), NA, Sleep_label))
df4
#> Day SleepStatus Sleeping Sleep_label Final_status
#> 1 1 Sleeping 0 Start sleeping Start sleeping
#> 2 1 Sleeping 0 Start sleeping <NA>
#> 3 1 Sleeping 0 Start sleeping <NA>
#> 4 1 Awake 1 Stop sleeping Stop sleeping
#> 5 2 Sleeping 1 Start sleeping Start sleeping
#> 6 2 Sleeping 1 Start sleeping <NA>
#> 7 2 Sleeping 1 Start sleeping <NA>
#> 8 2 Awake 2 Stop sleeping Stop sleeping
由 reprex package (v2.0.1)
于 2022-05-20 创建
这有意义吗?还是我只是让事情变得更混乱了?
在 Base R 中,您可以执行以下操作:
x <- dt$SleepStatus
is.na(x) <- -cumsum(c(1,head(rle(x)$lengths,-1)))
dt$final_status <- c(Sleeping = 'Start Sleep', Awake = 'Stop Sleep')[x]
dt
Day SleepStatus final_status
1 1 Sleeping Start Sleep
2 1 Sleeping <NA>
3 1 Sleeping <NA>
4 1 Awake Stop Sleep
5 2 Sleeping Start Sleep
6 2 Sleeping <NA>
7 2 Sleeping <NA>
8 2 Awake Stop Sleep
我是 R 的新手,所以我想弄清楚如何才能做得更好。我有一个数据 table,它包含两列(Day 和 Sleepstatus)。我如何根据列 day 找到睡眠和清醒的第一次迭代,并改变另一列以指示人何时开始睡眠(第一行睡眠)和停止睡眠(第一行清醒)。剩余的睡眠时长,该列应显示 N.A.
Day | SleepStatus |
---|---|
1 | Sleeping |
1 | Sleeping |
1 | Sleeping |
1 | Awake |
2 | Sleeping |
2 | Sleeping |
2 | Sleeping |
2 | Awake |
期望的输出
Day | SleepStatus | Final Status |
---|---|---|
1 | Sleeping | Start Sleep |
1 | Sleeping | NA |
1 | Sleeping | Stop Sleep |
1 | Awake | NA |
2 | Sleeping | Start Sleep |
2 | Sleeping | NA |
2 | Sleeping | Stop Sleep |
2 | Awake | NA |
这是一种可能的解决方案:
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
dt[, `Final Status` := {ifelse(
cumsum(SleepStatus != "Sleeping") != shift(cumsum(SleepStatus != "Sleeping"), fill = 0, type = "lag"),
"Stop Sleep", "Start Sleep")}]
dt[, `Final Status` := {ifelse(
`Final Status` == shift(`Final Status`, fill = "NA", type = "lag"),
NA, `Final Status`)}]
dt
#> Day SleepStatus Final Status
#> 1: 1 Sleeping Start Sleep
#> 2: 1 Sleeping <NA>
#> 3: 1 Sleeping <NA>
#> 4: 1 Awake Stop Sleep
#> 5: 2 Sleeping Start Sleep
#> 6: 2 Sleeping <NA>
#> 7: 2 Sleeping <NA>
#> 8: 2 Awake Stop Sleep
如果将代码分解成更小的块,代码会更有意义。我已经使用下面的 tidyverse 函数完成了此操作,因为我觉得它更容易理解,但如果您愿意,我可以将其更改为 data.table 语法。
library(data.table)
dt <- data.table::data.table(
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
SleepStatus = c("Sleeping","Sleeping","Sleeping",
"Awake","Sleeping","Sleeping","Sleeping","Awake")
)
library(tidyverse)
df <- as.data.frame(dt)
# When the Sleepstatus is not "Sleeping", increment the variable by one
df2 <- df %>%
mutate(Sleeping = cumsum(SleepStatus != "Sleeping"))
df2
#> Day SleepStatus Sleeping
#> 1 1 Sleeping 0
#> 2 1 Sleeping 0
#> 3 1 Sleeping 0
#> 4 1 Awake 1
#> 5 2 Sleeping 1
#> 6 2 Sleeping 1
#> 7 2 Sleeping 1
#> 8 2 Awake 2
# If the previous value in "Sleeping" is different to the current value,
# add the "stop sleeping" flag (i.e. show when "Sleeping" changes)
df3 <- df2 %>%
mutate(Sleep_label = ifelse(Sleeping != lag(Sleeping, default = 0), "Stop sleeping", "Start sleeping"))
df3
#> Day SleepStatus Sleeping Sleep_label
#> 1 1 Sleeping 0 Start sleeping
#> 2 1 Sleeping 0 Start sleeping
#> 3 1 Sleeping 0 Start sleeping
#> 4 1 Awake 1 Stop sleeping
#> 5 2 Sleeping 1 Start sleeping
#> 6 2 Sleeping 1 Start sleeping
#> 7 2 Sleeping 1 Start sleeping
#> 8 2 Awake 2 Stop sleeping
# Then, if the value in Sleep_label is equal to the previous label,
# change it to NA
df4 <- df3 %>%
mutate(Final_status = ifelse(Sleep_label == lag(Sleep_label, default = "NA"), NA, Sleep_label))
df4
#> Day SleepStatus Sleeping Sleep_label Final_status
#> 1 1 Sleeping 0 Start sleeping Start sleeping
#> 2 1 Sleeping 0 Start sleeping <NA>
#> 3 1 Sleeping 0 Start sleeping <NA>
#> 4 1 Awake 1 Stop sleeping Stop sleeping
#> 5 2 Sleeping 1 Start sleeping Start sleeping
#> 6 2 Sleeping 1 Start sleeping <NA>
#> 7 2 Sleeping 1 Start sleeping <NA>
#> 8 2 Awake 2 Stop sleeping Stop sleeping
由 reprex package (v2.0.1)
于 2022-05-20 创建这有意义吗?还是我只是让事情变得更混乱了?
在 Base R 中,您可以执行以下操作:
x <- dt$SleepStatus
is.na(x) <- -cumsum(c(1,head(rle(x)$lengths,-1)))
dt$final_status <- c(Sleeping = 'Start Sleep', Awake = 'Stop Sleep')[x]
dt
Day SleepStatus final_status
1 1 Sleeping Start Sleep
2 1 Sleeping <NA>
3 1 Sleeping <NA>
4 1 Awake Stop Sleep
5 2 Sleeping Start Sleep
6 2 Sleeping <NA>
7 2 Sleeping <NA>
8 2 Awake Stop Sleep