Linking/Mapping R 数据框中的列

Linking/Mapping columns in a dataframe in R

我有一个包含两列 titletext 的数据框。

数据框如下所示:

title text
foreign keys A foreign key is a column or group of columns...
Week 2 In Week 2 the encoding...
comments colection of comments about Week 2
Statistics Statistics is the discipline...
comments collection of comments about Statistics

数据框基本上表示某个主题的评论正好出现在它下面。所以我想 link/map 这两个东西,这样如果我给出主题的名称 (title) 它将检索其相应的评论 (text).在此示例中,由于主题 1 没有任何评论,因此我不需要它们。通过这种方式,我想通过只保留与评论相关的主题来在一定程度上减少我的数据框的大小。

到目前为止我只能做以下事情:

df %>% 
  filter(title == "Week 2") %>% 
  pull(text)

这给了我对应的 text(很明显),而不是 关于第 2 周的评论 的评论。对于下面没有任何评论的主题,我不需要它们。

我们可能需要通过创建分组列来 filter 具有 'review' 的 'Topic'。一旦我们对数据进行子集化,就更容易 pull 'text' 或使用 title

创建 'text' 的命名向量
library(dplyr)
library(stringr)
df1 %>% 
 group_by(grp = cumsum(str_detect(title, '^Topic'))) %>% 
 filter(any(str_detect(title, 'review')) & str_detect(text, 'text'))  %>%
 ungroup

-输出

# A tibble: 2 × 3
  title   text              grp
  <chr>   <chr>           <int>
1 Topic 2 text of Topic 2     2
2 Topic 3 text of Topic 3     3

更新数据

df2 %>% 
  group_by(grp = cumsum(c(TRUE, diff(str_detect(title, 'comments')) != 1))) %>%  
  filter(any(str_detect(title, 'comments') ) & title != 'comments') %>% 
  ungroup

-输出

# A tibble: 2 × 3
  title      text                              grp
  <chr>      <chr>                           <int>
1 Week 2     In Week 2 the encoding...           2
2 Statistics Statistics is the discipline...     3

数据

df1 <- structure(list(title = c("Topic 1", "Topic 2", "review 2", "Topic 3", 
"review 3"), text = c("text of Topic 1", "text of Topic 2", "review of Topic 2", 
"text of Topic 3", "review of Topic 3")), class = "data.frame",
 row.names = c(NA, 
-5L))

df2 <- structure(list(title = c("foreign keys", "Week 2", "comments", 
"Statistics", "comments"), text = c("A foreign key is a column or group of columns...", 
"In Week 2 the encoding...", "colection of comments about Week 2", 
"Statistics is the discipline...", "collection of comments about Statistics"
)), class = "data.frame", row.names = c(NA, -5L))