Linking/Mapping R 数据框中的列
Linking/Mapping columns in a dataframe in R
我有一个包含两列 title
和 text
的数据框。
数据框如下所示:
title
text
foreign keys
A foreign key is a column or group of columns...
Week 2
In Week 2 the encoding...
comments
colection of comments about Week 2
Statistics
Statistics is the discipline...
comments
collection of comments about Statistics
数据框基本上表示某个主题的评论正好出现在它下面。所以我想 link/map 这两个东西,这样如果我给出主题的名称 (title
) 它将检索其相应的评论 (text
).在此示例中,由于主题 1 没有任何评论,因此我不需要它们。通过这种方式,我想通过只保留与评论相关的主题来在一定程度上减少我的数据框的大小。
到目前为止我只能做以下事情:
df %>%
filter(title == "Week 2") %>%
pull(text)
这给了我对应的 text
(很明显),而不是 关于第 2 周的评论 的评论。对于下面没有任何评论的主题,我不需要它们。
我们可能需要通过创建分组列来 filter
具有 'review' 的 'Topic'。一旦我们对数据进行子集化,就更容易 pull
'text' 或使用 title
创建 'text' 的命名向量
library(dplyr)
library(stringr)
df1 %>%
group_by(grp = cumsum(str_detect(title, '^Topic'))) %>%
filter(any(str_detect(title, 'review')) & str_detect(text, 'text')) %>%
ungroup
-输出
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Topic 2 text of Topic 2 2
2 Topic 3 text of Topic 3 3
更新数据
df2 %>%
group_by(grp = cumsum(c(TRUE, diff(str_detect(title, 'comments')) != 1))) %>%
filter(any(str_detect(title, 'comments') ) & title != 'comments') %>%
ungroup
-输出
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Week 2 In Week 2 the encoding... 2
2 Statistics Statistics is the discipline... 3
数据
df1 <- structure(list(title = c("Topic 1", "Topic 2", "review 2", "Topic 3",
"review 3"), text = c("text of Topic 1", "text of Topic 2", "review of Topic 2",
"text of Topic 3", "review of Topic 3")), class = "data.frame",
row.names = c(NA,
-5L))
df2 <- structure(list(title = c("foreign keys", "Week 2", "comments",
"Statistics", "comments"), text = c("A foreign key is a column or group of columns...",
"In Week 2 the encoding...", "colection of comments about Week 2",
"Statistics is the discipline...", "collection of comments about Statistics"
)), class = "data.frame", row.names = c(NA, -5L))
我有一个包含两列 title
和 text
的数据框。
数据框如下所示:
title | text |
---|---|
foreign keys | A foreign key is a column or group of columns... |
Week 2 | In Week 2 the encoding... |
comments | colection of comments about Week 2 |
Statistics | Statistics is the discipline... |
comments | collection of comments about Statistics |
数据框基本上表示某个主题的评论正好出现在它下面。所以我想 link/map 这两个东西,这样如果我给出主题的名称 (title
) 它将检索其相应的评论 (text
).在此示例中,由于主题 1 没有任何评论,因此我不需要它们。通过这种方式,我想通过只保留与评论相关的主题来在一定程度上减少我的数据框的大小。
到目前为止我只能做以下事情:
df %>%
filter(title == "Week 2") %>%
pull(text)
这给了我对应的 text
(很明显),而不是 关于第 2 周的评论 的评论。对于下面没有任何评论的主题,我不需要它们。
我们可能需要通过创建分组列来 filter
具有 'review' 的 'Topic'。一旦我们对数据进行子集化,就更容易 pull
'text' 或使用 title
library(dplyr)
library(stringr)
df1 %>%
group_by(grp = cumsum(str_detect(title, '^Topic'))) %>%
filter(any(str_detect(title, 'review')) & str_detect(text, 'text')) %>%
ungroup
-输出
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Topic 2 text of Topic 2 2
2 Topic 3 text of Topic 3 3
更新数据
df2 %>%
group_by(grp = cumsum(c(TRUE, diff(str_detect(title, 'comments')) != 1))) %>%
filter(any(str_detect(title, 'comments') ) & title != 'comments') %>%
ungroup
-输出
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Week 2 In Week 2 the encoding... 2
2 Statistics Statistics is the discipline... 3
数据
df1 <- structure(list(title = c("Topic 1", "Topic 2", "review 2", "Topic 3",
"review 3"), text = c("text of Topic 1", "text of Topic 2", "review of Topic 2",
"text of Topic 3", "review of Topic 3")), class = "data.frame",
row.names = c(NA,
-5L))
df2 <- structure(list(title = c("foreign keys", "Week 2", "comments",
"Statistics", "comments"), text = c("A foreign key is a column or group of columns...",
"In Week 2 the encoding...", "colection of comments about Week 2",
"Statistics is the discipline...", "collection of comments about Statistics"
)), class = "data.frame", row.names = c(NA, -5L))