从长到宽的非分类值重塑
reshape from long to wide of non-categorical values
我需要将具有非分类值的日期从长改成宽
而不是相同数量的值。
数据帧示例:
df_long <- as.data.frame(cbind(c("id A", "b", "b", "d", "d","id B", "kh", "kk", "ip", "id C", "99", "id D", "id E"),c(1,1,1,1,1, 2,2,2,2,3,3,1,1)))
我需要这个:
df_wide <- as.data.frame(rbind(c("id A", "b", "b", "d", "d"), c("id B", "kh", "kk", "ip", ""), c("id C", "99", "", "", ""), c("id D", "", "", "", ""), c("id E", "", "", "", "")))
我不知道如何重塑它,因为值不是绝对的,而且并非每个 id 都具有相同数量的值。
所以我想知道如何将此类数据从长变宽和从宽变长。
感谢您的帮助!
你可以这样做:
a = aggregate(V1~V2,transform(df_long,V2 = cumsum(grepl("id",V1))),paste,collapse=',')[,2]
read.csv(text=a,header = FALSE,fill = TRUE)
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E
既然要转化回来,那么应该做:
f<-read.csv(text = with(df_long,tapply(V1,cumsum(grepl("id",V1)),paste0,collapse=",")),
header = FALSE, fill = TRUE,stringsAsFactors = F,na.strings = "")
print(f,na = "")
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E
现在要将其转换回您的 long_data,您可以这样做:
with(g <- transform(stack(f),ind = c(row(f))),na.omit(g[order(ind),]))
values ind
1 id A 1
6 b 1
11 b 1
16 d 1
21 d 1
2 id B 2
7 kh 2
12 kk 2
17 ip 2
3 id C 3
8 99 3
4 id D 4
5 id E 5
一个tidyverse
选项
library(tidyverse)
df_long %>%
separate(V1, into = c("id", "val"), fill = "left") %>%
select(-V2) %>%
mutate(row = cumsum(!is.na(id))) %>%
fill(id) %>%
group_by(row) %>%
mutate(col = 1:n()) %>%
ungroup() %>%
pivot_wider(
id_cols = c(row, id),
names_from = col,
names_prefix = "V",
values_from = val,
values_fill = list(val = ""))
## A tibble: 5 x 7
# row id V1 V2 V3 V4 V5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 id A b b d d
#2 2 id B kh kk ip ""
#3 3 id C 99 "" "" ""
#4 4 id D "" "" "" ""
#5 5 id E "" "" "" ""
PS。 df_long$V2
列中的条目似乎未在 df_wide
中使用。对吗?
我需要将具有非分类值的日期从长改成宽 而不是相同数量的值。
数据帧示例:
df_long <- as.data.frame(cbind(c("id A", "b", "b", "d", "d","id B", "kh", "kk", "ip", "id C", "99", "id D", "id E"),c(1,1,1,1,1, 2,2,2,2,3,3,1,1)))
我需要这个:
df_wide <- as.data.frame(rbind(c("id A", "b", "b", "d", "d"), c("id B", "kh", "kk", "ip", ""), c("id C", "99", "", "", ""), c("id D", "", "", "", ""), c("id E", "", "", "", "")))
我不知道如何重塑它,因为值不是绝对的,而且并非每个 id 都具有相同数量的值。
所以我想知道如何将此类数据从长变宽和从宽变长。
感谢您的帮助!
你可以这样做:
a = aggregate(V1~V2,transform(df_long,V2 = cumsum(grepl("id",V1))),paste,collapse=',')[,2]
read.csv(text=a,header = FALSE,fill = TRUE)
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E
既然要转化回来,那么应该做:
f<-read.csv(text = with(df_long,tapply(V1,cumsum(grepl("id",V1)),paste0,collapse=",")),
header = FALSE, fill = TRUE,stringsAsFactors = F,na.strings = "")
print(f,na = "")
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E
现在要将其转换回您的 long_data,您可以这样做:
with(g <- transform(stack(f),ind = c(row(f))),na.omit(g[order(ind),]))
values ind
1 id A 1
6 b 1
11 b 1
16 d 1
21 d 1
2 id B 2
7 kh 2
12 kk 2
17 ip 2
3 id C 3
8 99 3
4 id D 4
5 id E 5
一个tidyverse
选项
library(tidyverse)
df_long %>%
separate(V1, into = c("id", "val"), fill = "left") %>%
select(-V2) %>%
mutate(row = cumsum(!is.na(id))) %>%
fill(id) %>%
group_by(row) %>%
mutate(col = 1:n()) %>%
ungroup() %>%
pivot_wider(
id_cols = c(row, id),
names_from = col,
names_prefix = "V",
values_from = val,
values_fill = list(val = ""))
## A tibble: 5 x 7
# row id V1 V2 V3 V4 V5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 id A b b d d
#2 2 id B kh kk ip ""
#3 3 id C 99 "" "" ""
#4 4 id D "" "" "" ""
#5 5 id E "" "" "" ""
PS。 df_long$V2
列中的条目似乎未在 df_wide
中使用。对吗?