将多列的行值转换为 R 中的列名?
Turn row values from multiple columns into column names in R?
我有一个如下所示的数据框:
state1 state1_pp state2 state2_pp state3 state3_pp
<chr> <chr> <chr> <chr> <chr> <chr>
1 0 0.995614 F 0.004386 NA 0
2 0 1 NA 0 NA 0
3 0 1 NA 0 NA 0
我希望每行的值作为列名,数值作为行值:
0 F NA
<chr> <chr> <chr>
1 0.995614 0.004386 0
2 1 0 0
3 1 0 0
我如何在 R 中执行此操作?
或者更复杂的场景:
state1 state1_pp state2 state2_pp state3 state3_pp
1 0 0.995614 F 0.004386 NA 0
2 A 1 B 0 C 0
3 D 0.7 B 0.3 NA 0
这就是我想要的:
0 A D F B C NA
1 0.995614 0 0 0.004386 0 0 0
2 0 1 0 0 0 0 0
3 0 0 0.7 0 0.3 0 0
首先是警告,列名是数字(如 1
)或保留 R 关键字(如 NA
)可能会导致各种错误。但如果你必须这样做,我建议如下:
library(dplyr)
# extract title row
headers <- df %>%
head(1) %>%
select(state1, state2, state3) %>%
unlist(use.names = FALSE) %>%
as.character()
# replace NA with "NA"
headers[is.na(headers)] = "NA"
# drop columns that are not wanted
new_df <- df %>%
select(-state1, -state2, -state3)
# replace column names
colnames(new_df) <- headers
为了引用您的新列,您可能需要使用反引号:`
因此,使用新的列名 0
、F
和 NA
,您可以调用 df$F
,但不能调用 df$NA
或 df
.相反,您必须调用 df$`1`
和 df$`NA`
.
这是使用 dplyr
和 tidyr
的尝试:
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
mutate_all(as.character) %>%
pivot_longer(cols = -row) %>%
mutate(name = sub('\d+', '', name)) %>%
group_by(name, row) %>%
mutate(row1 = row_number()) %>%
pivot_wider() %>%
group_by(state, row) %>%
mutate(row1 = row_number()) %>%
pivot_wider(names_from = state, values_from = state_pp,
values_fill = list(state_pp = 0)) %>%
ungroup() %>%
select(-row, -row1)
# A tibble: 3 x 7
# `0` F `NA` A B C D
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 0.995614 0.004386 0 0 0 0 0
#2 0 0 0 1 0 0 0
#3 0 0 0 0 0.3 0 0.7
我有一个如下所示的数据框:
state1 state1_pp state2 state2_pp state3 state3_pp
<chr> <chr> <chr> <chr> <chr> <chr>
1 0 0.995614 F 0.004386 NA 0
2 0 1 NA 0 NA 0
3 0 1 NA 0 NA 0
我希望每行的值作为列名,数值作为行值:
0 F NA
<chr> <chr> <chr>
1 0.995614 0.004386 0
2 1 0 0
3 1 0 0
我如何在 R 中执行此操作?
或者更复杂的场景:
state1 state1_pp state2 state2_pp state3 state3_pp
1 0 0.995614 F 0.004386 NA 0
2 A 1 B 0 C 0
3 D 0.7 B 0.3 NA 0
这就是我想要的:
0 A D F B C NA
1 0.995614 0 0 0.004386 0 0 0
2 0 1 0 0 0 0 0
3 0 0 0.7 0 0.3 0 0
首先是警告,列名是数字(如 1
)或保留 R 关键字(如 NA
)可能会导致各种错误。但如果你必须这样做,我建议如下:
library(dplyr)
# extract title row
headers <- df %>%
head(1) %>%
select(state1, state2, state3) %>%
unlist(use.names = FALSE) %>%
as.character()
# replace NA with "NA"
headers[is.na(headers)] = "NA"
# drop columns that are not wanted
new_df <- df %>%
select(-state1, -state2, -state3)
# replace column names
colnames(new_df) <- headers
为了引用您的新列,您可能需要使用反引号:`
因此,使用新的列名 0
、F
和 NA
,您可以调用 df$F
,但不能调用 df$NA
或 df
.相反,您必须调用 df$`1`
和 df$`NA`
.
这是使用 dplyr
和 tidyr
的尝试:
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
mutate_all(as.character) %>%
pivot_longer(cols = -row) %>%
mutate(name = sub('\d+', '', name)) %>%
group_by(name, row) %>%
mutate(row1 = row_number()) %>%
pivot_wider() %>%
group_by(state, row) %>%
mutate(row1 = row_number()) %>%
pivot_wider(names_from = state, values_from = state_pp,
values_fill = list(state_pp = 0)) %>%
ungroup() %>%
select(-row, -row1)
# A tibble: 3 x 7
# `0` F `NA` A B C D
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 0.995614 0.004386 0 0 0 0 0
#2 0 0 0 1 0 0 0
#3 0 0 0 0 0.3 0 0.7