创建从 0 到 table 变量值的新列
Creating new columns that go from 0 to the value in the variable of a table
可重现的小标题: 我有一个类似于下图所示的数据库。不同之处在于我正在使用的数据库要大得多。
general_tibble <- tibble(gender = c("female", "female", "male"),
age = c(18, 19,18),
age_partner = c(22,20,17),
max_age = c(60, 60, 65),
nrs =c(42,41,47))
general_tibble
结果:
gender age age_partner max_age nrs
1 female 18 22 60 42
2 female 19 20 60 41
3 male 18 17 65 47
问题:
我如何从以前的 table 创建一个新的 table,它采用 nrs
的值,并创建一个名为 n
的列变量,从 0 到nrs
?
中的值
为了进一步说明,在 general_tibble
的第 1 行中,列 nrs
等于 42,因此该列将从 0 变为 42,在第 2 行中 nrs
等于41 所以列将从 0 到 41,第 3 行也是如此。
我目前正在使用下面的代码。它可以工作,但是当 general_tibble
太大时,代码执行起来非常慢。
general_list <- list()
for(i in 1:NROW(general_tibble)){
general_list[[i]] <- data.frame(general_tibble[i, ],
n = 0:general_tibble[[i, "nrs"]])
}
然后我bind_rows()
general_list
得到general_binded
general_binded <- bind_rows(general_list)
general_binded[c(1:5, 38:42),]
结果:
gender age age_partner max_age nrs n
1 female 18 22 60 42 0
2 female 18 22 60 42 1
3 female 18 22 60 42 2
4 female 18 22 60 42 3
5 female 18 22 60 42 4
38 female 18 22 60 42 37
39 female 18 22 60 42 38
40 female 18 22 60 42 39
41 female 18 22 60 42 40
42 female 18 22 60 42 41
PS: 在 for 循环中我使用 data.frame()
而不是 tibble()
因为我想回收行。如果您有一些涉及小标题或数据框的建议,我很乐意采纳。
我们可以用uncount
library(tidyverse)
general_tibble %>%
mutate(grp = row_number(), nrsN = nrs + 1) %>%
uncount(nrsN) %>%
group_by(grp) %>%
mutate(n = row_number() - 1) %>%
ungroup %>%
select(-grp)
# A tibble: 133 x 6
# gender age age_partner max_age nrs n
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 female 18 22 60 42 0
# 2 female 18 22 60 42 1
# 3 female 18 22 60 42 2
# 4 female 18 22 60 42 3
# 5 female 18 22 60 42 4
# 6 female 18 22 60 42 5
# 7 female 18 22 60 42 6
# 8 female 18 22 60 42 7
# 9 female 18 22 60 42 8
#10 female 18 22 60 42 9
# … with 123 more rows
另一种选择是unnest
general_tibble %>%
mutate(n = map(nrs+1, ~ seq(.x) - 1)) %>%
unnest
一种使用基础 R 的方法(减去 tibble
包)。
首先,按nrs
组划分。其次,通过 nrs
值扩展每个数据框的行。第三,创建代表 0:whatever 行数的 id
列。四、带回一个tibble
:
library(tibble)
df <- tibble(
gender = c("female", "female", "male"),
age = c(18, 19, 18),
age_partner = c(22, 20, 17),
max_age = c(60, 60, 65),
nrs = c(42, 41, 47)
)
nrs_split <- split(df, df$nrs)
df_list <- lapply(nrs_split, function(i) i[rep(seq_len(nrow(i)), each=i$nrs + 1), ])
df_renum <- lapply(df_list, function(i) {i$id <- 0:rle(i$nrs)$values; return(i)})
df <- do.call("rbind", df_renum)
df
#> # A tibble: 133 x 6
#> gender age age_partner max_age nrs id
#> * <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 female 19 20 60 41 0
#> 2 female 19 20 60 41 1
#> 3 female 19 20 60 41 2
#> 4 female 19 20 60 41 3
#> 5 female 19 20 60 41 4
#> 6 female 19 20 60 41 5
#> 7 female 19 20 60 41 6
#> 8 female 19 20 60 41 7
#> 9 female 19 20 60 41 8
#> 10 female 19 20 60 41 9
#> # … with 123 more rows
最简单的方法是使用 tidyr::expand()
函数扩展 nrs
列上的 general_tibble
:
library(tidyverse)
general_tibble %>%
group_by_all()%>%
expand(n = 0:nrs)
#> # A tibble: 133 x 6
#> # Groups: gender, age, age_partner, max_age, nrs [3]
#> gender age age_partner max_age nrs n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 female 18 22 60 42 0
#> 2 female 18 22 60 42 1
#> 3 female 18 22 60 42 2
#> 4 female 18 22 60 42 3
#> 5 female 18 22 60 42 4
#> 6 female 18 22 60 42 5
#> 7 female 18 22 60 42 6
#> 8 female 18 22 60 42 7
#> 9 female 18 22 60 42 8
#> 10 female 18 22 60 42 9
#> # ... with 123 more rows
由 reprex package (v0.2.1)
创建于 2019-05-21
仅使用 base R
函数的另一个想法:
expanded_vars <- do.call(rbind,lapply(general_tibble$nrs,
function(x) expand.grid(x, 0:x)))
names(expanded_vars) <- c("nrs", "n")
merge(y = expanded_vars, x = general_tibble, by = "nrs", all = TRUE)
使用dplyr
和tidyr
,您还可以:
general_tibble %>%
group_by(rowid = row_number()) %>%
mutate(n = nrs) %>%
complete(n = seq(0, n, 1)) %>%
fill(everything(), .direction = "up") %>%
ungroup() %>%
select(-rowid)
n gender age age_partner max_age nrs
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 female 18 22 60 42
2 1 female 18 22 60 42
3 2 female 18 22 60 42
4 3 female 18 22 60 42
5 4 female 18 22 60 42
6 5 female 18 22 60 42
7 6 female 18 22 60 42
8 7 female 18 22 60 42
9 8 female 18 22 60 42
10 9 female 18 22 60 42
使用 data.table
与 tidyverse
的一个好处是,您无需根据您正在做的事情是否是 mutate
、expand
,或 summarize
。您可以将您想要的内容放入 df[i, j, k]
的 j
部分,无论解析为多少行,这就是您得到的内容。
library(data.table)
setDT(general_tibble)
general_tibble[, .(n = seq(0, nrs))
, by = names(general_tibble)]
# gender age age_partner max_age nrs n
# 1: female 18 22 60 42 0
# 2: female 18 22 60 42 1
# 3: female 18 22 60 42 2
# 4: female 18 22 60 42 3
# 5: female 18 22 60 42 4
# ---
# 129: male 18 17 65 47 43
# 130: male 18 17 65 47 44
# 131: male 18 17 65 47 45
# 132: male 18 17 65 47 46
# 133: male 18 17 65 47 47
可重现的小标题: 我有一个类似于下图所示的数据库。不同之处在于我正在使用的数据库要大得多。
general_tibble <- tibble(gender = c("female", "female", "male"),
age = c(18, 19,18),
age_partner = c(22,20,17),
max_age = c(60, 60, 65),
nrs =c(42,41,47))
general_tibble
结果:
gender age age_partner max_age nrs
1 female 18 22 60 42
2 female 19 20 60 41
3 male 18 17 65 47
问题:
我如何从以前的 table 创建一个新的 table,它采用 nrs
的值,并创建一个名为 n
的列变量,从 0 到nrs
?
为了进一步说明,在 general_tibble
的第 1 行中,列 nrs
等于 42,因此该列将从 0 变为 42,在第 2 行中 nrs
等于41 所以列将从 0 到 41,第 3 行也是如此。
我目前正在使用下面的代码。它可以工作,但是当 general_tibble
太大时,代码执行起来非常慢。
general_list <- list()
for(i in 1:NROW(general_tibble)){
general_list[[i]] <- data.frame(general_tibble[i, ],
n = 0:general_tibble[[i, "nrs"]])
}
然后我bind_rows()
general_list
得到general_binded
general_binded <- bind_rows(general_list)
general_binded[c(1:5, 38:42),]
结果:
gender age age_partner max_age nrs n
1 female 18 22 60 42 0
2 female 18 22 60 42 1
3 female 18 22 60 42 2
4 female 18 22 60 42 3
5 female 18 22 60 42 4
38 female 18 22 60 42 37
39 female 18 22 60 42 38
40 female 18 22 60 42 39
41 female 18 22 60 42 40
42 female 18 22 60 42 41
PS: 在 for 循环中我使用 data.frame()
而不是 tibble()
因为我想回收行。如果您有一些涉及小标题或数据框的建议,我很乐意采纳。
我们可以用uncount
library(tidyverse)
general_tibble %>%
mutate(grp = row_number(), nrsN = nrs + 1) %>%
uncount(nrsN) %>%
group_by(grp) %>%
mutate(n = row_number() - 1) %>%
ungroup %>%
select(-grp)
# A tibble: 133 x 6
# gender age age_partner max_age nrs n
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 female 18 22 60 42 0
# 2 female 18 22 60 42 1
# 3 female 18 22 60 42 2
# 4 female 18 22 60 42 3
# 5 female 18 22 60 42 4
# 6 female 18 22 60 42 5
# 7 female 18 22 60 42 6
# 8 female 18 22 60 42 7
# 9 female 18 22 60 42 8
#10 female 18 22 60 42 9
# … with 123 more rows
另一种选择是unnest
general_tibble %>%
mutate(n = map(nrs+1, ~ seq(.x) - 1)) %>%
unnest
一种使用基础 R 的方法(减去 tibble
包)。
首先,按nrs
组划分。其次,通过 nrs
值扩展每个数据框的行。第三,创建代表 0:whatever 行数的 id
列。四、带回一个tibble
:
library(tibble)
df <- tibble(
gender = c("female", "female", "male"),
age = c(18, 19, 18),
age_partner = c(22, 20, 17),
max_age = c(60, 60, 65),
nrs = c(42, 41, 47)
)
nrs_split <- split(df, df$nrs)
df_list <- lapply(nrs_split, function(i) i[rep(seq_len(nrow(i)), each=i$nrs + 1), ])
df_renum <- lapply(df_list, function(i) {i$id <- 0:rle(i$nrs)$values; return(i)})
df <- do.call("rbind", df_renum)
df
#> # A tibble: 133 x 6
#> gender age age_partner max_age nrs id
#> * <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 female 19 20 60 41 0
#> 2 female 19 20 60 41 1
#> 3 female 19 20 60 41 2
#> 4 female 19 20 60 41 3
#> 5 female 19 20 60 41 4
#> 6 female 19 20 60 41 5
#> 7 female 19 20 60 41 6
#> 8 female 19 20 60 41 7
#> 9 female 19 20 60 41 8
#> 10 female 19 20 60 41 9
#> # … with 123 more rows
最简单的方法是使用 tidyr::expand()
函数扩展 nrs
列上的 general_tibble
:
library(tidyverse)
general_tibble %>%
group_by_all()%>%
expand(n = 0:nrs)
#> # A tibble: 133 x 6
#> # Groups: gender, age, age_partner, max_age, nrs [3]
#> gender age age_partner max_age nrs n
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 female 18 22 60 42 0
#> 2 female 18 22 60 42 1
#> 3 female 18 22 60 42 2
#> 4 female 18 22 60 42 3
#> 5 female 18 22 60 42 4
#> 6 female 18 22 60 42 5
#> 7 female 18 22 60 42 6
#> 8 female 18 22 60 42 7
#> 9 female 18 22 60 42 8
#> 10 female 18 22 60 42 9
#> # ... with 123 more rows
由 reprex package (v0.2.1)
创建于 2019-05-21仅使用 base R
函数的另一个想法:
expanded_vars <- do.call(rbind,lapply(general_tibble$nrs,
function(x) expand.grid(x, 0:x)))
names(expanded_vars) <- c("nrs", "n")
merge(y = expanded_vars, x = general_tibble, by = "nrs", all = TRUE)
使用dplyr
和tidyr
,您还可以:
general_tibble %>%
group_by(rowid = row_number()) %>%
mutate(n = nrs) %>%
complete(n = seq(0, n, 1)) %>%
fill(everything(), .direction = "up") %>%
ungroup() %>%
select(-rowid)
n gender age age_partner max_age nrs
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 0 female 18 22 60 42
2 1 female 18 22 60 42
3 2 female 18 22 60 42
4 3 female 18 22 60 42
5 4 female 18 22 60 42
6 5 female 18 22 60 42
7 6 female 18 22 60 42
8 7 female 18 22 60 42
9 8 female 18 22 60 42
10 9 female 18 22 60 42
使用 data.table
与 tidyverse
的一个好处是,您无需根据您正在做的事情是否是 mutate
、expand
,或 summarize
。您可以将您想要的内容放入 df[i, j, k]
的 j
部分,无论解析为多少行,这就是您得到的内容。
library(data.table)
setDT(general_tibble)
general_tibble[, .(n = seq(0, nrs))
, by = names(general_tibble)]
# gender age age_partner max_age nrs n
# 1: female 18 22 60 42 0
# 2: female 18 22 60 42 1
# 3: female 18 22 60 42 2
# 4: female 18 22 60 42 3
# 5: female 18 22 60 42 4
# ---
# 129: male 18 17 65 47 43
# 130: male 18 17 65 47 44
# 131: male 18 17 65 47 45
# 132: male 18 17 65 47 46
# 133: male 18 17 65 47 47