R - 一个尺度多列重新编码
R - one scale multiple column recode
我和一位研究员正试图找出一种方法来使我们的数据框更干净、更整洁。
这是一个代表:
> head(Dummy1)
# A tibble: 6 x 18
A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 C0 C1 C2 C3 C4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
3 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
5 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
6 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
# … with 1 more variable: C5 <dbl>
>
由于我们的软件注册答案的方式,我们得到了 A0 到 A5、B0 到 B5 等等,而不是这个:
> head(Dummy2)
# A tibble: 6 x 3
A B C
<dbl> <dbl> <dbl>
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
6 0 0 0
>
是否有代码可以让我们将第一个版本(每个可能的答案作为一个带有二进制 0 NO 1 YES 的列)转换为带有数字结果的单个项目列?我们正在尝试分析的量表有超过 50 个项目,每个项目的范围从 0 到 8。
感谢您的帮助!
您可以使用 split.default
在一个数据框中拆分所有相同的组列。使用 sapply
和 max.col
来获取每行中具有最高值的列号。我做了 -1
因为你的列号以 0 开头。
sapply(split.default(Dummy1, sub('\d+', '', names(Dummy1))), max.col) - 1
sub('\d+', '', names(Dummy1))
从列名称中删除数字,以便它们 return "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"......
用作要在 split.default
.
中拆分的组
你也可以试试这个:
library(tidyverse)
d1 %>%
pivot_longer(cols=everything(), names_to='col') %>%
# to longer data by taking all columns into 'col', the default for values column is value here , you can change that name, I am sticking with default value
filter(value != 0) %>%
# keep only values having non zero status
mutate(newval = as.numeric(str_extract(col, '\d+$')),
col = str_replace(col, '\d+','')) %>%
## replace original col by removing their numbers and create another column by only taking the numbers
select(-value) % >%
# removing value column created as its a constant and converting back to wide data then unnesting every column
pivot_wider(names_from = col, values_from =newval, values_fn = list) %>%
unnest(everything())
输入数据:
d1 <- data.frame(A0 = c(0,0,0,0,0,1),
A1 = c(0,0,0,0,1,0),
A2 = c(0,0,0,1,0,0),
A3 = c(0,0,1,0,0,0),
A4 = c(0,1,0,0,0,0),
A5 = c(1,0,0,0,0,0),
B0 = c(0,0,0,0,0,1),
B1 = c(0,0,0,0,1,0),
B2 = c(0,0,0,1,0,0),
B3 = c(0,0,1,0,0,0),
B4 = c(0,1,0,0,0,0),
B5 = c(1,0,0,0,0,0),
C0 = c(0,0,0,0,0,1),
C1 = c(0,0,0,0,1,0),
C2 = c(0,0,0,1,0,0),
C3 = c(0,0,1,0,0,0),
C4 = c(0,1,0,0,0,0),
C5 = c(1,0,0,0,0,0))
输出:
# A tibble: 6 x 3
A B C
<dbl> <dbl> <dbl>
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
6 0 0 0
我和一位研究员正试图找出一种方法来使我们的数据框更干净、更整洁。 这是一个代表:
> head(Dummy1)
# A tibble: 6 x 18
A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 C0 C1 C2 C3 C4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
3 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
5 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
6 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
# … with 1 more variable: C5 <dbl>
>
由于我们的软件注册答案的方式,我们得到了 A0 到 A5、B0 到 B5 等等,而不是这个:
> head(Dummy2)
# A tibble: 6 x 3
A B C
<dbl> <dbl> <dbl>
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
6 0 0 0
>
是否有代码可以让我们将第一个版本(每个可能的答案作为一个带有二进制 0 NO 1 YES 的列)转换为带有数字结果的单个项目列?我们正在尝试分析的量表有超过 50 个项目,每个项目的范围从 0 到 8。
感谢您的帮助!
您可以使用 split.default
在一个数据框中拆分所有相同的组列。使用 sapply
和 max.col
来获取每行中具有最高值的列号。我做了 -1
因为你的列号以 0 开头。
sapply(split.default(Dummy1, sub('\d+', '', names(Dummy1))), max.col) - 1
sub('\d+', '', names(Dummy1))
从列名称中删除数字,以便它们 return "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"......
用作要在 split.default
.
你也可以试试这个:
library(tidyverse)
d1 %>%
pivot_longer(cols=everything(), names_to='col') %>%
# to longer data by taking all columns into 'col', the default for values column is value here , you can change that name, I am sticking with default value
filter(value != 0) %>%
# keep only values having non zero status
mutate(newval = as.numeric(str_extract(col, '\d+$')),
col = str_replace(col, '\d+','')) %>%
## replace original col by removing their numbers and create another column by only taking the numbers
select(-value) % >%
# removing value column created as its a constant and converting back to wide data then unnesting every column
pivot_wider(names_from = col, values_from =newval, values_fn = list) %>%
unnest(everything())
输入数据:
d1 <- data.frame(A0 = c(0,0,0,0,0,1),
A1 = c(0,0,0,0,1,0),
A2 = c(0,0,0,1,0,0),
A3 = c(0,0,1,0,0,0),
A4 = c(0,1,0,0,0,0),
A5 = c(1,0,0,0,0,0),
B0 = c(0,0,0,0,0,1),
B1 = c(0,0,0,0,1,0),
B2 = c(0,0,0,1,0,0),
B3 = c(0,0,1,0,0,0),
B4 = c(0,1,0,0,0,0),
B5 = c(1,0,0,0,0,0),
C0 = c(0,0,0,0,0,1),
C1 = c(0,0,0,0,1,0),
C2 = c(0,0,0,1,0,0),
C3 = c(0,0,1,0,0,0),
C4 = c(0,1,0,0,0,0),
C5 = c(1,0,0,0,0,0))
输出:
# A tibble: 6 x 3
A B C
<dbl> <dbl> <dbl>
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
6 0 0 0