使用 tidyr 重塑数据
Reshaping data using tidyr
我正在使用一个数据框 data
,它的结构与下面的相似。
Gender Age Number
1 Female 55-59 years 5
2 Female 65+ years 10
3 Male 25-29 years 4
4 Male 40-44 years 3
5 Male 50-54 years 1
我正在尝试使用 tidyr 重塑数据(迄今为止未成功),以便 Number
列的每个值都在其自己的行中显示。我正在寻找的输出应类似于以下内容:
Gender Age
1 Female 55-59 years
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years
6 Female 65+ years
7 Female 65+ years
8 Female 65+ years
9 Female 65+ years
10 Female 65+ years
11 Female 65+ years
12 Female 65+ years
13 Female 65+ years
14 Female 65+ years
15 Female 65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years
我曾尝试使用 gather/spread 函数的各种组合,但离成功还差得很远。我相当确定这在 tidyr 中是可能的!
我知道我可以使用其他一些 packages/functions 来获得相同的结果,但我非常希望得到一个 tidyr 解决方案,这样我就可以将它包含在更大的 dplyr/tidyr 管道.
非常感谢任何帮助。
dat <- structure(list(Gender = structure(c(3L, 3L, 1L, 2L, 1L), .Label = c(" Male",
" Male", "Female"), class = "factor"), Age = structure(c(5L,
1L, 2L, 3L, 4L), .Label = c("65+ years", "25-29 years", "40-44 years",
"50-54 years", "55-59 years"), class = "factor"), Number = c(5L,
10L, 4L, 3L, 1L)), .Names = c("Gender", "Age", "Number"), class = "data.frame", row.names = c(NA,
-5L))
不是 tidyr 但相当快速和高效:
dat2 <- dat[rep(1:nrow(dat), dat[["Number"]]), 1:2]
rownames(dat2) <- NULL
## Gender Age
## 1 Female 55-59 years
## 2 Female 55-59 years
## 3 Female 55-59 years
## 4 Female 55-59 years
## 5 Female 55-59 years
## 6 Female 65+ years
## 7 Female 65+ years
## 8 Female 65+ years
## 9 Female 65+ years
## 10 Female 65+ years
## 11 Female 65+ years
## 12 Female 65+ years
## 13 Female 65+ years
## 14 Female 65+ years
## 15 Female 65+ years
## 16 Male 25-29 years
## 17 Male 25-29 years
## 18 Male 25-29 years
## 19 Male 25-29 years
## 20 Male 40-44 years
## 21 Male 40-44 years
## 22 Male 40-44 years
## 23 Male 50-54 years
这也不是用tidyr,不过我觉得很自然:
dat %>% slice(rep(row_number(), Number)) %>% select(-Number)
Gender Age
1 Female 55-59 years
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years
6 Female 65+ years
7 Female 65+ years
8 Female 65+ years
9 Female 65+ years
10 Female 65+ years
11 Female 65+ years
12 Female 65+ years
13 Female 65+ years
14 Female 65+ years
15 Female 65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years
正如@bramtayl 所建议的那样,
可以(可以说)提高可读性
dat %>% slice(row_number() %>% rep(Number)) %>% select(-Number)
我们可以使用 tidyr/dplyr
来做到这一点。将值更改为序列 unnest
后,将 'Number' 转换为 list
列,并使用 select
.[=18= 从输出中删除 'Number' 列]
library(dplyr)
library(tidyr)
dat1 <- dat %>%
mutate(Number= lapply(Number, seq)) %>%
unnest(Number) %>%
select(-Number)
请注意,输出将是一个 tbl_df
,这在我们使用 dplyr
函数执行其他操作时会很有用。
str(dat1)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 23 obs. of 2 variables:
# $ Gender: Factor w/ 3 levels " Male"," Male",..: 3 3 3 3 3 3 3 3 3 3 ...
# $ Age : Factor w/ 5 levels "65+ years","25-29 years",..: 5 5 5 5 5 1 1 1 1 1 ...
dat1 %>%
as.data.frame()
# Gender Age
#1 Female 55-59 years
#2 Female 55-59 years
#3 Female 55-59 years
#4 Female 55-59 years
#5 Female 55-59 years
#6 Female 65+ years
#7 Female 65+ years
#8 Female 65+ years
#9 Female 65+ years
#10 Female 65+ years
#11 Female 65+ years
#12 Female 65+ years
#13 Female 65+ years
#14 Female 65+ years
#15 Female 65+ years
#16 Male 25-29 years
#17 Male 25-29 years
#18 Male 25-29 years
#19 Male 25-29 years
#20 Male 40-44 years
#21 Male 40-44 years
#22 Male 40-44 years
#23 Male 50-54 years
我正在使用一个数据框 data
,它的结构与下面的相似。
Gender Age Number
1 Female 55-59 years 5
2 Female 65+ years 10
3 Male 25-29 years 4
4 Male 40-44 years 3
5 Male 50-54 years 1
我正在尝试使用 tidyr 重塑数据(迄今为止未成功),以便 Number
列的每个值都在其自己的行中显示。我正在寻找的输出应类似于以下内容:
Gender Age
1 Female 55-59 years
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years
6 Female 65+ years
7 Female 65+ years
8 Female 65+ years
9 Female 65+ years
10 Female 65+ years
11 Female 65+ years
12 Female 65+ years
13 Female 65+ years
14 Female 65+ years
15 Female 65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years
我曾尝试使用 gather/spread 函数的各种组合,但离成功还差得很远。我相当确定这在 tidyr 中是可能的!
我知道我可以使用其他一些 packages/functions 来获得相同的结果,但我非常希望得到一个 tidyr 解决方案,这样我就可以将它包含在更大的 dplyr/tidyr 管道.
非常感谢任何帮助。
dat <- structure(list(Gender = structure(c(3L, 3L, 1L, 2L, 1L), .Label = c(" Male",
" Male", "Female"), class = "factor"), Age = structure(c(5L,
1L, 2L, 3L, 4L), .Label = c("65+ years", "25-29 years", "40-44 years",
"50-54 years", "55-59 years"), class = "factor"), Number = c(5L,
10L, 4L, 3L, 1L)), .Names = c("Gender", "Age", "Number"), class = "data.frame", row.names = c(NA,
-5L))
不是 tidyr 但相当快速和高效:
dat2 <- dat[rep(1:nrow(dat), dat[["Number"]]), 1:2]
rownames(dat2) <- NULL
## Gender Age
## 1 Female 55-59 years
## 2 Female 55-59 years
## 3 Female 55-59 years
## 4 Female 55-59 years
## 5 Female 55-59 years
## 6 Female 65+ years
## 7 Female 65+ years
## 8 Female 65+ years
## 9 Female 65+ years
## 10 Female 65+ years
## 11 Female 65+ years
## 12 Female 65+ years
## 13 Female 65+ years
## 14 Female 65+ years
## 15 Female 65+ years
## 16 Male 25-29 years
## 17 Male 25-29 years
## 18 Male 25-29 years
## 19 Male 25-29 years
## 20 Male 40-44 years
## 21 Male 40-44 years
## 22 Male 40-44 years
## 23 Male 50-54 years
这也不是用tidyr,不过我觉得很自然:
dat %>% slice(rep(row_number(), Number)) %>% select(-Number)
Gender Age
1 Female 55-59 years
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years
6 Female 65+ years
7 Female 65+ years
8 Female 65+ years
9 Female 65+ years
10 Female 65+ years
11 Female 65+ years
12 Female 65+ years
13 Female 65+ years
14 Female 65+ years
15 Female 65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years
正如@bramtayl 所建议的那样,
可以(可以说)提高可读性dat %>% slice(row_number() %>% rep(Number)) %>% select(-Number)
我们可以使用 tidyr/dplyr
来做到这一点。将值更改为序列 unnest
后,将 'Number' 转换为 list
列,并使用 select
.[=18= 从输出中删除 'Number' 列]
library(dplyr)
library(tidyr)
dat1 <- dat %>%
mutate(Number= lapply(Number, seq)) %>%
unnest(Number) %>%
select(-Number)
请注意,输出将是一个 tbl_df
,这在我们使用 dplyr
函数执行其他操作时会很有用。
str(dat1)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 23 obs. of 2 variables:
# $ Gender: Factor w/ 3 levels " Male"," Male",..: 3 3 3 3 3 3 3 3 3 3 ...
# $ Age : Factor w/ 5 levels "65+ years","25-29 years",..: 5 5 5 5 5 1 1 1 1 1 ...
dat1 %>%
as.data.frame()
# Gender Age
#1 Female 55-59 years
#2 Female 55-59 years
#3 Female 55-59 years
#4 Female 55-59 years
#5 Female 55-59 years
#6 Female 65+ years
#7 Female 65+ years
#8 Female 65+ years
#9 Female 65+ years
#10 Female 65+ years
#11 Female 65+ years
#12 Female 65+ years
#13 Female 65+ years
#14 Female 65+ years
#15 Female 65+ years
#16 Male 25-29 years
#17 Male 25-29 years
#18 Male 25-29 years
#19 Male 25-29 years
#20 Male 40-44 years
#21 Male 40-44 years
#22 Male 40-44 years
#23 Male 50-54 years