Pivot_longer 通过整合多列
Pivot longer by integrating mutiple columns
我想将数据转为长格式并整合多列信息。
示例数据:
假设我们观察了一家网上商店的 4 种产品 (id 1:4) 和不同客户的评论评论 (comment*)。一个产品(id = 1)只有一个评论评论,而另一个产品(id = 4)有 4 个评论。对于每个评论,我们还观察该评论是否引用了另一个用户的评论(如果是,则为 1,否则为 0)。
data = data.frame(id = c(1,2,3,4), n_comments = c(2,1,3,4),
comment1 = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod "), comment1_quote = c(1,0,0,1),
comment2 = c("clita kasd gubergren", NA, "sanctus est", "consetetur sadipscing"), comment2_quote = c(0,NA,0,0),
comment3 = c(NA, NA, "invidunt ut labore", "ea rebum"), comment3_quote = c(NA,NA,1,0),
comment4 = c(NA, NA, NA, "dolores et ea rebum"), comment4_quote = c(NA,NA,NA,1))
data
id n_comments comment1 comment1_quote comment2 comment2_quote comment3 comment3_quote comment4 comment4_quote
1 1 2 consetetur sadipscing 1 clita kasd gubergren 0 <NA> NA <NA> NA
2 2 1 Lorem ipsum 0 <NA> NA <NA> NA <NA> NA
3 3 3 dolor sit ame 0 sanctus est 0 invidunt ut labore 1 <NA> NA
4 4 4 nonumy eirmod 1 consetetur sadipscing 0 ea rebum 0 dolores et ea rebum 1
现在我们想通过
将此数据转换为长格式
- 每个产品的每条评论占一行
- 如果评论引用了引用则添加信息
- 保持一种产品的评论总数不变
这里是目标数据:
target_data = data.frame(id = c(1,1,2,3,3,3,4,4,4,4), n_comments = c(2,2,1,3,3,3,4,4,4,4),
comment = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod ","clita kasd gubergren", "sanctus est", "consetetur sadipscing",
"invidunt ut labore", "ea rebum", "dolores et ea rebum"),
quote = c(1,0,0,1,0,0,0,1,0,1))
target_data
id n_comments comment quote
1 1 2 consetetur sadipscing 1
2 1 2 Lorem ipsum 0
3 2 1 dolor sit ame 0
4 3 3 nonumy eirmod 1
5 3 3 clita kasd gubergren 0
6 3 3 sanctus est 0
7 4 4 consetetur sadipscing 0
8 4 4 invidunt ut labore 1
9 4 4 ea rebum 0
10 4 4 dolores et ea rebum 1
这是我试过的方法,但不起作用:
trial_da = data %>% tidyr::pivot_longer(cols = starts_with('comment'), values_to = "comment", values_drop_na = TRUE)
Fehler: Can't combine `comment1` <character> and `comment1_quote` <double>.
Run `rlang::last_error()` to see where the error occurred.
trial_da
Fehler: Objekt 'trial_da' not found
发生这种情况是因为“引用”列也以“评论”开头。但是,我不确定如何解决这个问题。
data.table
接近
library(data.table)
ans <- setorder(
melt(setDT(data),
id.vars = c("id", "n_comments"),
measure.vars = patterns(comment = "comment[0-9]+$",
quote = ".*_quote"),
na.rm = TRUE), id)
# id n_comments variable comment quote
# 1: 1 2 1 consetetur sadipscing 1
# 2: 1 2 2 clita kasd gubergren 0
# 3: 2 1 1 Lorem ipsum 0
# 4: 3 3 1 dolor sit ame 0
# 5: 3 3 2 sanctus est 0
# 6: 3 3 3 invidunt ut labore 1
# 7: 4 4 1 nonumy eirmod 1
# 8: 4 4 2 consetetur sadipscing 0
# 9: 4 4 3 ea rebum 0
#10: 4 4 4 dolores et ea rebum 1
如果需要,您可以删除带有 ans[, variable := NULL]
的变量列。
稍微重命名列名后,您可以使用 tidyr::pivot_longer
-
names(data) <- sub('comment\d+_|\d+', '', names(data))
tidyr::pivot_longer(data,
cols = -c(id, n_comments),
names_to = '.value',
names_pattern = '(comment|quote)',
values_drop_na = TRUE)
# id n_comments comment quote
# <dbl> <dbl> <chr> <dbl>
# 1 1 2 "consetetur sadipscing" 1
# 2 1 2 "clita kasd gubergren" 0
# 3 2 1 "Lorem ipsum" 0
# 4 3 3 "dolor sit ame" 0
# 5 3 3 "sanctus est" 0
# 6 3 3 "invidunt ut labore" 1
# 7 4 4 "nonumy eirmod " 1
# 8 4 4 "consetetur sadipscing" 0
# 9 4 4 "ea rebum" 0
#10 4 4 "dolores et ea rebum" 1
我想将数据转为长格式并整合多列信息。
示例数据: 假设我们观察了一家网上商店的 4 种产品 (id 1:4) 和不同客户的评论评论 (comment*)。一个产品(id = 1)只有一个评论评论,而另一个产品(id = 4)有 4 个评论。对于每个评论,我们还观察该评论是否引用了另一个用户的评论(如果是,则为 1,否则为 0)。
data = data.frame(id = c(1,2,3,4), n_comments = c(2,1,3,4),
comment1 = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod "), comment1_quote = c(1,0,0,1),
comment2 = c("clita kasd gubergren", NA, "sanctus est", "consetetur sadipscing"), comment2_quote = c(0,NA,0,0),
comment3 = c(NA, NA, "invidunt ut labore", "ea rebum"), comment3_quote = c(NA,NA,1,0),
comment4 = c(NA, NA, NA, "dolores et ea rebum"), comment4_quote = c(NA,NA,NA,1))
data
id n_comments comment1 comment1_quote comment2 comment2_quote comment3 comment3_quote comment4 comment4_quote
1 1 2 consetetur sadipscing 1 clita kasd gubergren 0 <NA> NA <NA> NA
2 2 1 Lorem ipsum 0 <NA> NA <NA> NA <NA> NA
3 3 3 dolor sit ame 0 sanctus est 0 invidunt ut labore 1 <NA> NA
4 4 4 nonumy eirmod 1 consetetur sadipscing 0 ea rebum 0 dolores et ea rebum 1
现在我们想通过
将此数据转换为长格式- 每个产品的每条评论占一行
- 如果评论引用了引用则添加信息
- 保持一种产品的评论总数不变
这里是目标数据:
target_data = data.frame(id = c(1,1,2,3,3,3,4,4,4,4), n_comments = c(2,2,1,3,3,3,4,4,4,4),
comment = c("consetetur sadipscing", "Lorem ipsum", "dolor sit ame", "nonumy eirmod ","clita kasd gubergren", "sanctus est", "consetetur sadipscing",
"invidunt ut labore", "ea rebum", "dolores et ea rebum"),
quote = c(1,0,0,1,0,0,0,1,0,1))
target_data
id n_comments comment quote
1 1 2 consetetur sadipscing 1
2 1 2 Lorem ipsum 0
3 2 1 dolor sit ame 0
4 3 3 nonumy eirmod 1
5 3 3 clita kasd gubergren 0
6 3 3 sanctus est 0
7 4 4 consetetur sadipscing 0
8 4 4 invidunt ut labore 1
9 4 4 ea rebum 0
10 4 4 dolores et ea rebum 1
这是我试过的方法,但不起作用:
trial_da = data %>% tidyr::pivot_longer(cols = starts_with('comment'), values_to = "comment", values_drop_na = TRUE)
Fehler: Can't combine `comment1` <character> and `comment1_quote` <double>.
Run `rlang::last_error()` to see where the error occurred.
trial_da
Fehler: Objekt 'trial_da' not found
发生这种情况是因为“引用”列也以“评论”开头。但是,我不确定如何解决这个问题。
data.table
接近
library(data.table)
ans <- setorder(
melt(setDT(data),
id.vars = c("id", "n_comments"),
measure.vars = patterns(comment = "comment[0-9]+$",
quote = ".*_quote"),
na.rm = TRUE), id)
# id n_comments variable comment quote
# 1: 1 2 1 consetetur sadipscing 1
# 2: 1 2 2 clita kasd gubergren 0
# 3: 2 1 1 Lorem ipsum 0
# 4: 3 3 1 dolor sit ame 0
# 5: 3 3 2 sanctus est 0
# 6: 3 3 3 invidunt ut labore 1
# 7: 4 4 1 nonumy eirmod 1
# 8: 4 4 2 consetetur sadipscing 0
# 9: 4 4 3 ea rebum 0
#10: 4 4 4 dolores et ea rebum 1
如果需要,您可以删除带有 ans[, variable := NULL]
的变量列。
稍微重命名列名后,您可以使用 tidyr::pivot_longer
-
names(data) <- sub('comment\d+_|\d+', '', names(data))
tidyr::pivot_longer(data,
cols = -c(id, n_comments),
names_to = '.value',
names_pattern = '(comment|quote)',
values_drop_na = TRUE)
# id n_comments comment quote
# <dbl> <dbl> <chr> <dbl>
# 1 1 2 "consetetur sadipscing" 1
# 2 1 2 "clita kasd gubergren" 0
# 3 2 1 "Lorem ipsum" 0
# 4 3 3 "dolor sit ame" 0
# 5 3 3 "sanctus est" 0
# 6 3 3 "invidunt ut labore" 1
# 7 4 4 "nonumy eirmod " 1
# 8 4 4 "consetetur sadipscing" 0
# 9 4 4 "ea rebum" 0
#10 4 4 "dolores et ea rebum" 1