在数据框中取消列出并将其作为新行插入
unlist inside a dataframe and insert it as new row
我有一个包含两列的数据框:value
和 article_topics
如下:
str(myData)
Classes ‘tbl_df’ and 'data.frame': 10 obs. of 2 variables:
$ value : num 288 253 967 36769 2769 ...
$ article_topics:List of 10
..$ : logi NA
..$ : logi NA
..$ : chr "art and entertainment" "music" "style and fashion" "clothing" ...
..$ : chr "hobbies and interests" "guitar" "art and entertainment" "music" ...
..$ : logi NA
..$ : chr "pets" "large animals" "sports" "fishing" ...
..$ : chr "health and fitness"
..$ : chr "style and fashion" "clothing" "shirts"
..$ : logi NA
..$ : logi NA
我想 unlist
article_topics
例如我得到一个观察 article_topics
.
如果我举一个更简单的例子,它基本上意味着转换:
value article_topics
10 “Hello” , “This is an example”
进入这个:
value article_topics
10 “Hello”
10 “This is an example”
这是数据集:
structure(list(value = c(288, 253, 967, 36769, 2769, 541, 17,
889, 532, 2621), article_topics = list(NA, NA, c("art and entertainment",
"music", "style and fashion", "clothing", "lingerie", "movies and tv",
"movies"), c("hobbies and interests", "guitar", "art and entertainment",
"music", "musical instruments", "guitars", "technology and computing",
"consumer electronics", "telephones", "mobile phones", "smart phones"
), NA, c("pets", "large animals", "sports", "fishing", "freshwater fishing"
), "health and fitness", c("style and fashion", "clothing", "shirts"
), NA, NA)), class = c("tbl_df", "data.frame"), row.names = c(NA,
-10L), .Names = c("value", "article_topics"))
我一直在尝试使用 reshape2
中的 melt
和 tidyr
中的 gather
。但是它不适用于这种结构或者我无法弄清楚。
我找到了部分解决方案:
library(splitstackshape)
cSplit(ll, 'article_topics',',', 'long')
value article_topics
1: 288 NA
2: 253 NA
3: 967 c("art and entertainment"
4: 967 "music"
5: 967 "style and fashion"
6: 967 "clothing"
7: 967 "lingerie"
8: 967 "movies and tv"
9: 967 "movies")
10: 36769 c("hobbies and interests"
11: 36769 "guitar"
12: 36769 "art and entertainment"
13: 36769 "music"
14: 36769 "musical instruments"
15: 36769 "guitars"
16: 36769 "technology and computing"
17: 36769 "consumer electronics"
18: 36769 "telephones"
19: 36769 "mobile phones"
20: 36769 "smart phones")
21: 2769 NA
22: 541 c("pets"
23: 541 "large animals"
24: 541 "sports"
25: 541 "fishing"
26: 541 "freshwater fishing")
27: 17 health and fitness
28: 889 c("style and fashion"
29: 889 "clothing"
30: 889 "shirts")
31: 532 NA
32: 2621 NA
下一步将是使用类似 stringr
的东西来替换 c(
和 )
。
然而,在我看来,这并不是一个很好的方法。
欢迎任何帮助。
您可以使用 unnest
。尝试:
library(tidyr)
unnest(myData, article_topics)
示例输出:
> head(unnest(df, article_topics))
Source: local data frame [6 x 2]
value article_topics
1 288 NA
2 253 NA
3 967 art and entertainment
4 967 music
5 967 style and fashion
6 967 clothing
或者,您可以从我的 "splitstackshape" 包中尝试 listCol_l
。但是,它与 tbl_df
s 不兼容,因此您需要先 unclass
它。
尝试:
library(splitstackshape)
listCol_l(unclass(df), "article_topics")[]
我有一个包含两列的数据框:value
和 article_topics
如下:
str(myData)
Classes ‘tbl_df’ and 'data.frame': 10 obs. of 2 variables:
$ value : num 288 253 967 36769 2769 ...
$ article_topics:List of 10
..$ : logi NA
..$ : logi NA
..$ : chr "art and entertainment" "music" "style and fashion" "clothing" ...
..$ : chr "hobbies and interests" "guitar" "art and entertainment" "music" ...
..$ : logi NA
..$ : chr "pets" "large animals" "sports" "fishing" ...
..$ : chr "health and fitness"
..$ : chr "style and fashion" "clothing" "shirts"
..$ : logi NA
..$ : logi NA
我想 unlist
article_topics
例如我得到一个观察 article_topics
.
如果我举一个更简单的例子,它基本上意味着转换:
value article_topics
10 “Hello” , “This is an example”
进入这个:
value article_topics
10 “Hello”
10 “This is an example”
这是数据集:
structure(list(value = c(288, 253, 967, 36769, 2769, 541, 17,
889, 532, 2621), article_topics = list(NA, NA, c("art and entertainment",
"music", "style and fashion", "clothing", "lingerie", "movies and tv",
"movies"), c("hobbies and interests", "guitar", "art and entertainment",
"music", "musical instruments", "guitars", "technology and computing",
"consumer electronics", "telephones", "mobile phones", "smart phones"
), NA, c("pets", "large animals", "sports", "fishing", "freshwater fishing"
), "health and fitness", c("style and fashion", "clothing", "shirts"
), NA, NA)), class = c("tbl_df", "data.frame"), row.names = c(NA,
-10L), .Names = c("value", "article_topics"))
我一直在尝试使用 reshape2
中的 melt
和 tidyr
中的 gather
。但是它不适用于这种结构或者我无法弄清楚。
我找到了部分解决方案:
library(splitstackshape)
cSplit(ll, 'article_topics',',', 'long')
value article_topics
1: 288 NA
2: 253 NA
3: 967 c("art and entertainment"
4: 967 "music"
5: 967 "style and fashion"
6: 967 "clothing"
7: 967 "lingerie"
8: 967 "movies and tv"
9: 967 "movies")
10: 36769 c("hobbies and interests"
11: 36769 "guitar"
12: 36769 "art and entertainment"
13: 36769 "music"
14: 36769 "musical instruments"
15: 36769 "guitars"
16: 36769 "technology and computing"
17: 36769 "consumer electronics"
18: 36769 "telephones"
19: 36769 "mobile phones"
20: 36769 "smart phones")
21: 2769 NA
22: 541 c("pets"
23: 541 "large animals"
24: 541 "sports"
25: 541 "fishing"
26: 541 "freshwater fishing")
27: 17 health and fitness
28: 889 c("style and fashion"
29: 889 "clothing"
30: 889 "shirts")
31: 532 NA
32: 2621 NA
下一步将是使用类似 stringr
的东西来替换 c(
和 )
。
然而,在我看来,这并不是一个很好的方法。
欢迎任何帮助。
您可以使用 unnest
。尝试:
library(tidyr)
unnest(myData, article_topics)
示例输出:
> head(unnest(df, article_topics))
Source: local data frame [6 x 2]
value article_topics
1 288 NA
2 253 NA
3 967 art and entertainment
4 967 music
5 967 style and fashion
6 967 clothing
或者,您可以从我的 "splitstackshape" 包中尝试 listCol_l
。但是,它与 tbl_df
s 不兼容,因此您需要先 unclass
它。
尝试:
library(splitstackshape)
listCol_l(unclass(df), "article_topics")[]