使用 R 将行转换为分类列

Converting rows into a categorical column using R

我有一份采访转录,资料整理如下:

[1,]  "Interviewer"
[2,]  "What is your favorite food?"
[3,]  "Interviewee"
[4,]  "I love to eat pizza"
[5,]  "Interviewer"
[6,]  "Cool. But have you ever tried eating salad?"
[7,]  "Interviewee "
[8,]  "Yeah..."
[9,]  "Interviewer"
[10,] "I love salad, pizza is bad."
[11,] "Interviewee "
[12,] "I don't totally agree" 

我想从行中删除演讲的作者并将其变成分类列,如示例所示:

      [,1]                [,2]  
[1,]  "Interviewer"       "What is your favorite food?"
[2,]  "Interviewee"       "I love to eat pizza"
[3,]  "Interviewer"       "Cool. But have you ever tried eating a salad?"
[4,]  "Interviewee"       "Yeah..."
[5,]  "Interviewer"       "I love salad, pizza is bad."
[6,]  "Interviewee"       "I don't totally agree"

面试考虑的是两个人之间的对话。 有谁知道如何做到这一点? 提前致谢!

我们可以在 'Interview' 关键字、split 和 rbind

上使用 grepl 创建分组变量
do.call(rbind, split(v1, cumsum(grepl("^Interview", v1))))

-输出

 [,1]           [,2]                                         
1 "Interviewer"  "What is your favorite food?"                
2 "Interviewee"  "I love to eat pizza"                        
3 "Interviewer"  "Cool. But have you ever tried eating salad?"
4 "Interviewee " "Yeah..."                                    
5 "Interviewer"  "I love salad, pizza is bad."                
6 "Interviewee " "I don't totally agree"        

如果这些是备用元素,则要么使用循环索引创建两个列

cbind(v1[c(TRUE, FALSE)], v1[c(FALSE, TRUE)])
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"   

或使用matrix

matrix(v1, ncol = 2, byrow = TRUE)
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"                

数据

v1 <- c("Interviewer", "What is your favorite food?", "Interviewee", 
"I love to eat pizza", "Interviewer", 
"Cool. But have you ever tried eating salad?", 
"Interviewee ", "Yeah...", "Interviewer", "I love salad, pizza is bad.", 
"Interviewee ", "I don't totally agree")

这是另一种方法:

library(tidyverse)

tibble(v1 = v1) %>% 
  mutate(v2 = lead(v1)) %>% 
  filter(row_number() %% 2 == 1) %>% 
  as.matrix()

     v1             v2                                           
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"