根据两个条件提取部分字符串
Extracting part of string based on two conditions
我的数据集中有一个字符列,我想根据两个条件从中提取部分字符串:
a) 如果字符串以 "Therapist:" 开头,将字符串拆分为两列:一列包含单词 "Therapist",另一列包含剩余文本。
b) 如果它是 "Patient:",将字符串拆分为两列:一列包含单词 "Patient",另一列包含剩余文本。
我一直遇到的问题是我不知道如何在 R 中创建 if 语句。我是一个新手,但非常愿意学习。即使在谷歌搜索(Whosebug 等)并尝试了不同的功能之后,我仍然不知所措。
我的数据示例:
> 数据$语音[1:5]
[1] "Therapist: Okay, we’re back…"
[2] "Patient: Hmm-hmm."
[3] "Therapist: … after a couple of hours…"
[4] "Patient: Hmm-hmm."
[5] "Therapist: Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session."
非常感谢。
谢谢!
此命令创建一个两列数据框:
as.data.frame(do.call(rbind, strsplit(data$speech, ": ")))
结果:
V1 V2
1 Therapist Okay, we’re back…
2 Patient Hmm-hmm.
3 Therapist … after a couple of hours…
4 Patient Hmm-hmm.
5 Therapist Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session.
您可以使用 separate()
形式的 {tidyr}
包。
library(tidyr)
df <- data.frame(
speech = c(
"Therapist: Okay, we’re back…",
"Patient: Hmm-hmm.",
"Therapist: … after a : couple of hours…",
"Patient: Hmm-hmm.",
"Therapist: Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session."
)
)
separate(df, speech, into = c("Name", "Talk"), sep = ":", extra = "merge")
Name Talk
1 Therapist Okay, we’re back…
2 Patient Hmm-hmm.
3 Therapist … after a : couple of hours…
4 Patient Hmm-hmm.
5 Therapist Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session.
我使用参数extra = "merge"
来处理语音中:
的存在。
我的数据集中有一个字符列,我想根据两个条件从中提取部分字符串:
a) 如果字符串以 "Therapist:" 开头,将字符串拆分为两列:一列包含单词 "Therapist",另一列包含剩余文本。
b) 如果它是 "Patient:",将字符串拆分为两列:一列包含单词 "Patient",另一列包含剩余文本。
我一直遇到的问题是我不知道如何在 R 中创建 if 语句。我是一个新手,但非常愿意学习。即使在谷歌搜索(Whosebug 等)并尝试了不同的功能之后,我仍然不知所措。
我的数据示例:
> 数据$语音[1:5]
[1] "Therapist: Okay, we’re back…"
[2] "Patient: Hmm-hmm."
[3] "Therapist: … after a couple of hours…"
[4] "Patient: Hmm-hmm."
[5] "Therapist: Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session."
非常感谢。
谢谢!
此命令创建一个两列数据框:
as.data.frame(do.call(rbind, strsplit(data$speech, ": ")))
结果:
V1 V2
1 Therapist Okay, we’re back…
2 Patient Hmm-hmm.
3 Therapist … after a couple of hours…
4 Patient Hmm-hmm.
5 Therapist Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session.
您可以使用 separate()
形式的 {tidyr}
包。
library(tidyr)
df <- data.frame(
speech = c(
"Therapist: Okay, we’re back…",
"Patient: Hmm-hmm.",
"Therapist: … after a : couple of hours…",
"Patient: Hmm-hmm.",
"Therapist: Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session."
)
)
separate(df, speech, into = c("Name", "Talk"), sep = ":", extra = "merge")
Name Talk
1 Therapist Okay, we’re back…
2 Patient Hmm-hmm.
3 Therapist … after a : couple of hours…
4 Patient Hmm-hmm.
5 Therapist Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session.
我使用参数extra = "merge"
来处理语音中:
的存在。