cSplit_e 没有返回二进制数据帧
cSplit_e not returning a binary data frame
我有一个包含 Genre
列的数据框,其中包含像 Action,Romance
这样的行。我想拆分这些值并创建一个二进制向量。如果 Action,Romance,Drama
是所有可能的类型,那么上述行在输出数据框中将是 1,1,0
。
我发现 this and this SO posts, and this CRAN doc 覆盖了 cSplit_e,但是当我使用它时,我没有得到二进制数据帧输出,我得到的是带有一些乱序值的原始数据帧。
a = cSplit_e(df4, "Genre", sep = ",", mode = "binary", type = "character", drop=TRUE, fixed=TRUE,fill = 0)
编辑:问题似乎是将新列添加到旧数据框,而不是创建新框。我怎样才能让流派进入他们自己的框架?
> names(a)
[1] "Title" "Year" "Rated" "Released" "Runtime" "Genre" "Director" "Writer" "Actors"
[10] "Plot" "Language" "Country" "Awards" "Poster" "Metascore" "imdbRating" "imdbVotes" "imdbID"
[19] "Type" "tomatoMeter" "tomatoImage" "tomatoRating" "tomatoReviews" "tomatoFresh" "tomatoRotten" "tomatoConsensus" "tomatoUserMeter"
[28] "tomatoUserRating" "tomatoUserReviews" "tomatoURL" "DVD" "BoxOffice" "Production" "Website" "Response" "Budget"
[37] "Domestic_Gross" "Gross" "Date" "Genre_Action" "Genre_Adult" "Genre_Adventure" "Genre_Animation" "Genre_Biography" "Genre_Comedy"
[46] "Genre_Crime" "Genre_Documentary" "Genre_Drama" "Genre_Family" "Genre_Fantasy" "Genre_Film-Noir" "Genre_Game-Show" "Genre_History" "Genre_Horror"
[55] "Genre_Music" "Genre_Musical" "Genre_Mystery" "Genre_N/A" "Genre_News" "Genre_Reality-TV" "Genre_Romance" "Genre_Sci-Fi" "Genre_Short"
[64] "Genre_Sport" "Genre_Talk-Show" "Genre_Thriller" "Genre_War" "Genre_Western"
drop
参数仅适用于被拆分的列,而不适用于 data.frame
中的所有其他列。因此,要随后仅提取拆分列,请使用原始列名称并仅提取那些列。
示例:
> a <- cSplit_e(df4, "Genre", ",", mode = "binary", type = "character", fill = 0, drop = TRUE)
> a
id Genre_Action Genre_Drama Genre_Romance
1 1 1 0 1
2 2 1 1 1
> a[startsWith(names(a), "Genre")]
Genre_Action Genre_Drama Genre_Romance
1 1 0 1
2 1 1 1
其中:
df4 <- structure(list(Genre = c("Action,Romance", "Action,Romance,Drama"), id = 1:2),
.Names = c("Genre", "id"), row.names = 1:2, class = "data.frame")
我有一个包含 Genre
列的数据框,其中包含像 Action,Romance
这样的行。我想拆分这些值并创建一个二进制向量。如果 Action,Romance,Drama
是所有可能的类型,那么上述行在输出数据框中将是 1,1,0
。
我发现 this and this SO posts, and this CRAN doc 覆盖了 cSplit_e,但是当我使用它时,我没有得到二进制数据帧输出,我得到的是带有一些乱序值的原始数据帧。
a = cSplit_e(df4, "Genre", sep = ",", mode = "binary", type = "character", drop=TRUE, fixed=TRUE,fill = 0)
编辑:问题似乎是将新列添加到旧数据框,而不是创建新框。我怎样才能让流派进入他们自己的框架?
> names(a)
[1] "Title" "Year" "Rated" "Released" "Runtime" "Genre" "Director" "Writer" "Actors"
[10] "Plot" "Language" "Country" "Awards" "Poster" "Metascore" "imdbRating" "imdbVotes" "imdbID"
[19] "Type" "tomatoMeter" "tomatoImage" "tomatoRating" "tomatoReviews" "tomatoFresh" "tomatoRotten" "tomatoConsensus" "tomatoUserMeter"
[28] "tomatoUserRating" "tomatoUserReviews" "tomatoURL" "DVD" "BoxOffice" "Production" "Website" "Response" "Budget"
[37] "Domestic_Gross" "Gross" "Date" "Genre_Action" "Genre_Adult" "Genre_Adventure" "Genre_Animation" "Genre_Biography" "Genre_Comedy"
[46] "Genre_Crime" "Genre_Documentary" "Genre_Drama" "Genre_Family" "Genre_Fantasy" "Genre_Film-Noir" "Genre_Game-Show" "Genre_History" "Genre_Horror"
[55] "Genre_Music" "Genre_Musical" "Genre_Mystery" "Genre_N/A" "Genre_News" "Genre_Reality-TV" "Genre_Romance" "Genre_Sci-Fi" "Genre_Short"
[64] "Genre_Sport" "Genre_Talk-Show" "Genre_Thriller" "Genre_War" "Genre_Western"
drop
参数仅适用于被拆分的列,而不适用于 data.frame
中的所有其他列。因此,要随后仅提取拆分列,请使用原始列名称并仅提取那些列。
示例:
> a <- cSplit_e(df4, "Genre", ",", mode = "binary", type = "character", fill = 0, drop = TRUE)
> a
id Genre_Action Genre_Drama Genre_Romance
1 1 1 0 1
2 2 1 1 1
> a[startsWith(names(a), "Genre")]
Genre_Action Genre_Drama Genre_Romance
1 1 0 1
2 1 1 1
其中:
df4 <- structure(list(Genre = c("Action,Romance", "Action,Romance,Drama"), id = 1:2),
.Names = c("Genre", "id"), row.names = 1:2, class = "data.frame")