识别短语中的单词并编码为 0 或 1
identify words within a phrase and code as 0 or 1
我正在处理儿童所说的话语和陈述。从每个话语中,如果语句中的一个或多个单词与多个 'core' 单词(可能是 300 个单词)的预定义列表匹配,那么我想将“1”输入 'Core' (如果 none, 然后输入'0'到'Core').
同样,如果语句中有一个或多个词匹配不同的预定义列表 'fringe' 词(可能是 300 个边缘词;再次与核心词不同),那么我想在 'Fringe' 中输入“1”(如果 none,则在 'Fringe' 中输入“0”)。
基本上,现在我只有话语,从这些话语中,我需要确定是否有任何词与核心词之一匹配,是否与任何边缘词匹配。这是我的数据片段。
Core Fringe Utterance
1 NA NA small
2 NA NA small
3 NA NA where's his bed
4 NA NA there's his bed
5 NA NA there's his bed
6 NA NA is that a pillow
提前致谢。我搜索了档案,但很难找到适合我情况的解决方案。
dput() 代码是:
structure(list(Utterance = c("small", "small", "where's his bed", "there's his bed", "there's his bed", "is that a pillow", "what is that on his head", "hey he has his arm stuck here", "there there's it", "now you're gonna go night_night", "and that's the thing you can turn on", "yeah where's the music+box"), Core = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Fringe = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -12L))
这里有一个可能可以解决您的问题的快速方法(尽管我相信还有更优雅的解决方案)...
df <- structure(list(Utterance = c("small", "small", "where's his bed", "there's his bed", "there's his bed", "is that a pillow", "what is that on his head", "hey he has his arm stuck here", "there there's it", "now you're gonna go night_night", "and that's the thing you can turn on", "yeah where's the music+box"), Core = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Fringe = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -12L))
## Define object with all the core terms:
CorePatterns <- c("his", "music", "turn")
## Define value of `df$Core` as `1` if `df$Utterance`
## contains one of the patterns in `CorePatterns`,
## otherwise, define it as `0`:
df$Core <- ifelse(grepl(paste(CorePatterns, collapse = "|"),
df$Utterance),
1, 0)
df
Utterance Core Fringe
> 1 small 0 NA
> 2 small 0 NA
> 3 where's his bed 1 NA
> 4 there's his bed 1 NA
> 5 there's his bed 1 NA
> 6 is that a pillow 0 NA
> 7 what is that on his head 1 NA
> 8 hey he has his arm stuck here 1 NA
> 9 there there's it 0 NA
> 10 now you're gonna go night_night 0 NA
> 11 and that's the thing you can turn on 1 NA
> 12 yeah where's the music+box 1 NA
您可以对 Fringe
数据执行相同的操作。
tidyverse 选项可以是:
library(dplyr)
library(stringr)
coreWords <- c('small', 'bed')
fringeWords <- c('head', 'night')
df %>%
mutate(Core = + str_detect(Utterance, str_c(coreWords, collapse = '|')),
Fringe = + str_detect(Utterance, str_c(fringeWords, collapse = '|')))
# Utterance Core Fringe
# 1 small 1 0
# 2 small 1 0
# 3 where's his bed 1 0
# 4 there's his bed 1 0
# 5 there's his bed 1 0
# 6 is that a pillow 0 0
# 7 what is that on his head 0 1
# 8 hey he has his arm stuck here 0 0
# 9 there there's it 0 0
# 10 now you're gonna go night_night 0 1
# 11 and that's the thing you can turn on 0 0
# 12 yeah where's the music+box 0 0
我正在处理儿童所说的话语和陈述。从每个话语中,如果语句中的一个或多个单词与多个 'core' 单词(可能是 300 个单词)的预定义列表匹配,那么我想将“1”输入 'Core' (如果 none, 然后输入'0'到'Core').
同样,如果语句中有一个或多个词匹配不同的预定义列表 'fringe' 词(可能是 300 个边缘词;再次与核心词不同),那么我想在 'Fringe' 中输入“1”(如果 none,则在 'Fringe' 中输入“0”)。
基本上,现在我只有话语,从这些话语中,我需要确定是否有任何词与核心词之一匹配,是否与任何边缘词匹配。这是我的数据片段。
Core Fringe Utterance
1 NA NA small
2 NA NA small
3 NA NA where's his bed
4 NA NA there's his bed
5 NA NA there's his bed
6 NA NA is that a pillow
提前致谢。我搜索了档案,但很难找到适合我情况的解决方案。
dput() 代码是:
structure(list(Utterance = c("small", "small", "where's his bed", "there's his bed", "there's his bed", "is that a pillow", "what is that on his head", "hey he has his arm stuck here", "there there's it", "now you're gonna go night_night", "and that's the thing you can turn on", "yeah where's the music+box"), Core = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Fringe = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -12L))
这里有一个可能可以解决您的问题的快速方法(尽管我相信还有更优雅的解决方案)...
df <- structure(list(Utterance = c("small", "small", "where's his bed", "there's his bed", "there's his bed", "is that a pillow", "what is that on his head", "hey he has his arm stuck here", "there there's it", "now you're gonna go night_night", "and that's the thing you can turn on", "yeah where's the music+box"), Core = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Fringe = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -12L))
## Define object with all the core terms:
CorePatterns <- c("his", "music", "turn")
## Define value of `df$Core` as `1` if `df$Utterance`
## contains one of the patterns in `CorePatterns`,
## otherwise, define it as `0`:
df$Core <- ifelse(grepl(paste(CorePatterns, collapse = "|"),
df$Utterance),
1, 0)
df
Utterance Core Fringe
> 1 small 0 NA
> 2 small 0 NA
> 3 where's his bed 1 NA
> 4 there's his bed 1 NA
> 5 there's his bed 1 NA
> 6 is that a pillow 0 NA
> 7 what is that on his head 1 NA
> 8 hey he has his arm stuck here 1 NA
> 9 there there's it 0 NA
> 10 now you're gonna go night_night 0 NA
> 11 and that's the thing you can turn on 1 NA
> 12 yeah where's the music+box 1 NA
您可以对 Fringe
数据执行相同的操作。
tidyverse 选项可以是:
library(dplyr)
library(stringr)
coreWords <- c('small', 'bed')
fringeWords <- c('head', 'night')
df %>%
mutate(Core = + str_detect(Utterance, str_c(coreWords, collapse = '|')),
Fringe = + str_detect(Utterance, str_c(fringeWords, collapse = '|')))
# Utterance Core Fringe
# 1 small 1 0
# 2 small 1 0
# 3 where's his bed 1 0
# 4 there's his bed 1 0
# 5 there's his bed 1 0
# 6 is that a pillow 0 0
# 7 what is that on his head 0 1
# 8 hey he has his arm stuck here 0 0
# 9 there there's it 0 0
# 10 now you're gonna go night_night 0 1
# 11 and that's the thing you can turn on 0 0
# 12 yeah where's the music+box 0 0