根据两个条件提取部分字符串

Question

我的数据集中有一个字符列，我想根据两个条件从中提取部分字符串：

a) 如果字符串以 "Therapist:" 开头，将字符串拆分为两列：一列包含单词 "Therapist"，另一列包含剩余文本。

b) 如果它是 "Patient:"，将字符串拆分为两列：一列包含单词 "Patient"，另一列包含剩余文本。

我一直遇到的问题是我不知道如何在 R 中创建 if 语句。我是一个新手，但非常愿意学习。即使在谷歌搜索（Whosebug 等）并尝试了不同的功能之后，我仍然不知所措。

我的数据示例：

> 数据$语音[1:5]

[1] "Therapist: Okay, we’re back…"

[2] "Patient: Hmm-hmm."

[3] "Therapist: … after a couple of hours…"

[4] "Patient: Hmm-hmm."

[5] "Therapist: Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session."

非常感谢。

谢谢！

Answer 1

此命令创建一个两列数据框：

as.data.frame(do.call(rbind, strsplit(data$speech, ": ")))

结果：

         V1                                                                                     V2
1 Therapist                                                                      Okay, we’re back…
2   Patient                                                                               Hmm-hmm.
3 Therapist                                                             … after a couple of hours…
4   Patient                                                                               Hmm-hmm.
5 Therapist Hmm… Catch me up on what you’ve found yourself thinking and feeling after the session.

Answer 2

您可以使用 separate() 形式的 {tidyr} 包。

library(tidyr)

df <- data.frame(
  speech = c(
    "Therapist: Okay, we’re back…",
    "Patient: Hmm-hmm.",
    "Therapist: … after a : couple of hours…",
    "Patient: Hmm-hmm.",
    "Therapist: Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session."
  )
)

separate(df, speech, into = c("Name", "Talk"), sep = ":", extra = "merge")

       Name                                                                                      Talk
1 Therapist                                                                         Okay, we’re back…
2   Patient                                                                                  Hmm-hmm.
3 Therapist                                                              … after a : couple of hours…
4   Patient                                                                                  Hmm-hmm.
5 Therapist  Hmm… Catch me up on what you’ve : found yourself thinking and feeling after the session.

我使用参数extra = "merge"来处理语音中:的存在。

根据两个条件提取部分字符串

Extracting part of string based on two conditions

text

r

mining