R - 一个尺度多列重新编码

R - one scale multiple column recode

我和一位研究员正试图找出一种方法来使我们的数据框更干净、更整洁。 这是一个代表:

> head(Dummy1)
# A tibble: 6 x 18
     A0    A1    A2    A3    A4    A5    B0    B1    B2    B3    B4    B5    C0    C1    C2    C3    C4
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0
2     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1
3     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0
4     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0
5     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0
6     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0
# … with 1 more variable: C5 <dbl>
> 

由于我们的软件注册答案的方式,我们得到了 A0 到 A5、B0 到 B5 等等,而不是这个:

> head(Dummy2)
# A tibble: 6 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     5     5     5
2     4     4     4
3     3     3     3
4     2     2     2
5     1     1     1
6     0     0     0
> 

是否有代码可以让我们将第一个版本(每个可能的答案作为一个带有二进制 0 NO 1 YES 的列)转换为带有数字结果的单个项目列?我们正在尝试分析的量表有超过 50 个项目,每个项目的范围从 0 到 8。

感谢您的帮助!

您可以使用 split.default 在一个数据框中拆分所有相同的组列。使用 sapplymax.col 来获取每行中具有最高值的列号。我做了 -1 因为你的列号以 0 开头。

sapply(split.default(Dummy1, sub('\d+', '', names(Dummy1))), max.col) - 1

sub('\d+', '', names(Dummy1)) 从列名称中删除数字,以便它们 return "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"...... 用作要在 split.default.

中拆分的组

你也可以试试这个:

library(tidyverse)

d1 %>% 
  pivot_longer(cols=everything(), names_to='col') %>% 
  # to longer data by taking all columns into 'col', the default for values column is value here , you can change that name, I am sticking with default value
  filter(value != 0) %>% 
  # keep only values having non zero status
  mutate(newval = as.numeric(str_extract(col, '\d+$')),
         col = str_replace(col, '\d+','')) %>% 
  ## replace original col by removing their numbers and create another column by only taking the numbers
  select(-value) % >% 
  # removing value column created as its a constant and converting back to wide data then unnesting every column
  pivot_wider(names_from = col, values_from =newval, values_fn = list) %>% 
  unnest(everything())

输入数据:

d1 <- data.frame(A0 = c(0,0,0,0,0,1),
                 A1 = c(0,0,0,0,1,0),
                 A2 = c(0,0,0,1,0,0),
                 A3 = c(0,0,1,0,0,0),
                 A4 = c(0,1,0,0,0,0),
                 A5 = c(1,0,0,0,0,0),
                 B0 = c(0,0,0,0,0,1),
                 B1 = c(0,0,0,0,1,0),
                 B2 = c(0,0,0,1,0,0),
                 B3 = c(0,0,1,0,0,0),
                 B4 = c(0,1,0,0,0,0),
                 B5 = c(1,0,0,0,0,0),
                 C0 = c(0,0,0,0,0,1),
                 C1 = c(0,0,0,0,1,0),
                 C2 = c(0,0,0,1,0,0),
                 C3 = c(0,0,1,0,0,0),
                 C4 = c(0,1,0,0,0,0),
                 C5 = c(1,0,0,0,0,0))

输出:

# A tibble: 6 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     5     5     5
2     4     4     4
3     3     3     3
4     2     2     2
5     1     1     1
6     0     0     0