如何进行多次除法并将余数存储在 R 中的新变量中?
How to do several division and store the remainders in new variables in R?
我有一个数据框,每个数据点的结构如下:ID、度量、时间标记
ID measure timemark
001 12 15
003 3 13
004 365 0
003 1 13
ID是一个人唯一的study ID,measure是那个人当时使用某项服务的天数,timemark是0到51之间的数字,表示一年有52周x
现在我想创建 52 列的数据框,每一列都包含他们那一周在服务中花费的天数(因此每周的最大天数应该是 7 天)。对于每个人,他们可以在一个时间点有多个条目。从这个意义上讲,总天数应该是两行的总和。
所以我想把它变成这样:
ID ... week13 week14 week15 week 16
001 ... 0 0 7 5
003 ... 4 0 0 0
004 ... 7 7 7 7
我一直在纠结里面的逻辑,猜想应该是和度量的商余数有关,但是一直推不开。有人可以帮忙吗?
我们可以先为每个 ID
和 timemark
以及 sum
和 measure
值创建一行。我们创建一个列表,将 measure
分成 7 个步长及其余数。使用 unnest_longer
我们获取长格式数据并创建 timemark
列附加周数,最后 spread
宽格式数据。
library(dplyr)
library(tidyr)
df %>%
group_by(ID, timemark) %>%
summarise(measure = sum(measure)) %>%
mutate(measure = list(c(rep(7, floor(measure/7)), measure %% 7))) %>%
unnest_longer(measure) %>%
mutate(timemark = paste0('week', first(timemark) + 0:(n() - 1))) %>%
slice(1:pmin(n(), 52)) %>%
mutate(timemark = factor(timemark, levels = paste0('week', 0:51))) %>%
spread(timemark, measure)
#Or using pivot_wider in new tidyr
#pivot_wider(names_from = timemark, values_from = measure)
# A tibble: 3 x 53
# Groups: ID [3]
# ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 5
#2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA
#3 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
# week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
# week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
# week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
# week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
数据
df <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L,
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA, -4L))
我想留下我为你努力过的东西。首先,我使用 expand()
创建了一个主数据框,其中包含每个 ID
的 ID
和 timemark
的所有组合。然后,我通过以下方式创建了 result
。我通过 ID
和 timemark
定义了组并总结了度量。然后,我确定了我需要多少周(行)才能扩展第一个 mutate()
中的结果。然后,我使用 splitstackshape
包中的 expandRows() 扩展了数据框。然后,我更新了 timemark
中的数字,使第二个 mutate()
中的周数正确。然后,我进行了一些计算以分配每周的天数。 lag(measure - 7 * row_number(), default = 7)
创建一个向量,其中包含 measure
还剩多少天。我需要使用逻辑条件替换一些数字。对于每个group
,当行数为1时,赋值在measure
中。当res
大于7时,将7赋值给res
。 (任何大于 7 的数字都是 7,因为每周(行)最多可能需要 7 天。)否则,保留 res
中的原始值。
library(dplyr)
library(tidyr)
library(splitstackshape)
master <- expand(mydf, timemark = 0:51, ID)
group_by(mydf, ID, timemark) %>%
summarize(measure = sum(measure)) %>%
ungroup %>%
group_by(group = 1:n()) %>%
mutate(nrow = as.integer(measure / 7) + 1) %>%
expandRows(count = "nrow") %>%
mutate(timemark = first(timemark):(first(timemark) + n() - 1),
res = lag(measure - 7 * row_number(), default = 7),
res = case_when(n() == 1 ~ as.numeric(measure),
res > 7 ~ 7,
TRUE ~ res)) -> result
最后一步是将 result
加入 master
。我删除了不必要的列,使数据框变宽,并更新了列名。
left_join(master, result, by = c("ID", "timemark"))%>%
select(-c(measure, group)) %>%
spread(key = timemark, value = res, fill = 0) %>%
rename_at(vars(-ID),
.funs = list(~paste("week", ., sep = "")))
ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5
2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0
3 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
# week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
# week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
# week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
# week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
数据
mydf <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L,
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA,
-4L))
我有一个数据框,每个数据点的结构如下:ID、度量、时间标记
ID measure timemark
001 12 15
003 3 13
004 365 0
003 1 13
ID是一个人唯一的study ID,measure是那个人当时使用某项服务的天数,timemark是0到51之间的数字,表示一年有52周x
现在我想创建 52 列的数据框,每一列都包含他们那一周在服务中花费的天数(因此每周的最大天数应该是 7 天)。对于每个人,他们可以在一个时间点有多个条目。从这个意义上讲,总天数应该是两行的总和。
所以我想把它变成这样:
ID ... week13 week14 week15 week 16
001 ... 0 0 7 5
003 ... 4 0 0 0
004 ... 7 7 7 7
我一直在纠结里面的逻辑,猜想应该是和度量的商余数有关,但是一直推不开。有人可以帮忙吗?
我们可以先为每个 ID
和 timemark
以及 sum
和 measure
值创建一行。我们创建一个列表,将 measure
分成 7 个步长及其余数。使用 unnest_longer
我们获取长格式数据并创建 timemark
列附加周数,最后 spread
宽格式数据。
library(dplyr)
library(tidyr)
df %>%
group_by(ID, timemark) %>%
summarise(measure = sum(measure)) %>%
mutate(measure = list(c(rep(7, floor(measure/7)), measure %% 7))) %>%
unnest_longer(measure) %>%
mutate(timemark = paste0('week', first(timemark) + 0:(n() - 1))) %>%
slice(1:pmin(n(), 52)) %>%
mutate(timemark = factor(timemark, levels = paste0('week', 0:51))) %>%
spread(timemark, measure)
#Or using pivot_wider in new tidyr
#pivot_wider(names_from = timemark, values_from = measure)
# A tibble: 3 x 53
# Groups: ID [3]
# ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 5
#2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA
#3 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
# week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
# week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
# week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
# week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
数据
df <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L,
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA, -4L))
我想留下我为你努力过的东西。首先,我使用 expand()
创建了一个主数据框,其中包含每个 ID
的 ID
和 timemark
的所有组合。然后,我通过以下方式创建了 result
。我通过 ID
和 timemark
定义了组并总结了度量。然后,我确定了我需要多少周(行)才能扩展第一个 mutate()
中的结果。然后,我使用 splitstackshape
包中的 expandRows() 扩展了数据框。然后,我更新了 timemark
中的数字,使第二个 mutate()
中的周数正确。然后,我进行了一些计算以分配每周的天数。 lag(measure - 7 * row_number(), default = 7)
创建一个向量,其中包含 measure
还剩多少天。我需要使用逻辑条件替换一些数字。对于每个group
,当行数为1时,赋值在measure
中。当res
大于7时,将7赋值给res
。 (任何大于 7 的数字都是 7,因为每周(行)最多可能需要 7 天。)否则,保留 res
中的原始值。
library(dplyr)
library(tidyr)
library(splitstackshape)
master <- expand(mydf, timemark = 0:51, ID)
group_by(mydf, ID, timemark) %>%
summarize(measure = sum(measure)) %>%
ungroup %>%
group_by(group = 1:n()) %>%
mutate(nrow = as.integer(measure / 7) + 1) %>%
expandRows(count = "nrow") %>%
mutate(timemark = first(timemark):(first(timemark) + n() - 1),
res = lag(measure - 7 * row_number(), default = 7),
res = case_when(n() == 1 ~ as.numeric(measure),
res > 7 ~ 7,
TRUE ~ res)) -> result
最后一步是将 result
加入 master
。我删除了不必要的列,使数据框变宽,并更新了列名。
left_join(master, result, by = c("ID", "timemark"))%>%
select(-c(measure, group)) %>%
spread(key = timemark, value = res, fill = 0) %>%
rename_at(vars(-ID),
.funs = list(~paste("week", ., sep = "")))
ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5
2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0
3 4 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
# week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
# week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
# week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
# week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
数据
mydf <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L,
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA,
-4L))