180 个嵌套条件在一个单独的文件中为我的数据框中的每一行创建一个新的 id 变量

180 nested conditions in a separate file to create a new id variable for each row in the my dataframe

我需要识别实验参与者写的 180 个短句,并与每个句子匹配,在新列中编号。我在一个单独的文件中有 180 个条件。所有文本都是希伯来语,但我附上了可以理解的英语示例。

我正在添加来自 180 行实验数据的七行示例。有181种不同的条件。每个都有自己的序列号。所以我还添加了与此参与者数据相匹配的小 6 条件示例:

data_participant <- data.frame("text" =  c("I put a binder on a high shelf", 
                                           "My friend and me are eating chocolate", 
                                           "I wake up with  superhero powers", 
                                           "Low wooden table with cubes", 
                                           "The most handsome man in camopas invites me out", 
                                           "My mother tells me she loves me and protects me", 
                                           "My laptop drops and breaks"), 
                               "trial" = (1:7) )  

data_condition <- data.frame("condition_a" = c("wooden table"  , "eating" , "loves", 
                                               "binder", "handsome", "superhero"), 
                             "condition_b" = c("cubes",  "chocolate", "protects me", 
                                               "shelf","campos", "powers"), 
                             "condition_c" = c("0", "0", "0", "0", "me out", "0"),
                             "i.d." = (1:6) )

我决定使用 ifelse 函数和嵌套条件策略并编写 181 行代码。对于每个条件一行。这也很麻烦,因为它需要从英语转移到希伯来语。但是在 30 行之后我开始收到一条错误消息:

contextstack overflow

A screenshot of the error in line 147 means that after 33 conditions.

在示例中,每个条件最多有 3 个关键字,但在完整数据中,有 5 个或 6 个关键字的条件。 (这是因为参与者口头表述的多样性)。因此,条件的原始 table 有 7 列:on for i.d。不。其余的是与运算符 "or".

相同条件的词标识符
data <- mutate(data, script_id = ifelse((grepl( "wooden table" ,data$imagery))|(grepl( "cubes" ,data$imagery))
                                        ,"1",
                                        ifelse((grepl( "eating" ,data$imagery))|(grepl( "chocolate" ,data$imagery))
                                               ,"2",
                                               ifelse((grepl( "loves" ,data$imagery))|(grepl( "protect me" ,data$imagery))
                                                      ,"3", 

                                                      ifelse((grepl( "binder" ,data$imagery))|(grepl( "shelf" ,data$imagery))  
                                                             ,"4", 

                                                             ifelse(  (grepl("handsome"  ,data$imagery)) |(grepl( "campus" ,data$imagery) )|(grepl( "me out" ,data$imagery)) 
                                                                      ,"5",        

                                                                      ifelse((grepl("superhero", data$imagery)) | (grepl( "powers"  , data$imagery   ))
                                                                             ,"6",

                                                                             "181")))))))

# I expect the output will be new  column in the participant data frame 
# with the corresponding ID number for each text.
# I managed to get it when I made 33 conditions rows. And then I started 
# to get an error message contextstack overflow.

final_output <- data.frame("text" =  c("I put a binder on a high shelf", "My friend and me are eating chocolate", 
                                       "I wake up with  superhero powers", "Low wooden table with cubes", 
                                       "The most handsome man in camopas invites me out", 
                                       "My mother tells me she loves me and protects me", 
                                       "My laptop drops and breaks"), 
                           "trial" = (1:7), 
                           "i.d." = c(4, 2, 6, 1, 5, 3, 181) )

这是一种使用 fuzzymatch::regex_left_join 的方法。

data_condition_long <- data_condition %>%
  gather(col, text_match, -`i.d.`) %>%
  filter(text_match != 0) %>%
  arrange(`i.d.`)

data_participant %>%
  fuzzyjoin::regex_left_join(data_condition_long %>% select(-col), 
                             by = c("text" = "text_match")) %>%
  mutate(`i.d.` = if_else(is.na(`i.d.`), 181L, `i.d.`)) %>%
  # if `i.d.` is doubles instead of integers, use this:
  # mutate(`i.d.` = if_else(is.na(`i.d.`), 181, `i.d.`)) %>%
  group_by(trial) %>%
  slice(1) %>%
  ungroup() %>%
  select(-text_match)

# A tibble: 7 x 3
  text                                            trial  i.d.
  <fct>                                           <int> <int>
1 I put a binder on a high shelf                      1     4
2 My friend and me are eating chocolate               2     2
3 I wake up with  superhero powers                    3     6
4 Low wooden table with cubes                         4     1
5 The most handsome man in camopas invites me out     5     5
6 My mother tells me she loves me and protects me     6     3
7 My laptop drops and breaks                          7   181