将具有不均匀或缺失行的嵌套命名列表转换并展平为 R 中的单个数据框
Converting and flattening nested, named list with uneven or missing rows to single dataframe in R
我正在尝试将嵌套列表 url_expansion
转换为要与其他相应属性匹配的数据框,作为扁平化的 table。
url_expansion
包含 最多 4 个列表:
.Names = c("url", "topsy_expanded_url", "expanded_url", "display_url")
理想情况下,每个都应成为一个列标题,并在适当的地方应用 NA/null。到目前为止,这与其他数据一起使用:
score <- sapply(tweets, function(x) x$score)
然而,由于 url_expansions
缺少某些行的数据,以下代码:
url_expansions <- sapply(tweets, function(x) x$url_expansions)
display_url <- sapply(url_expansions, function(x) x$diplay_url)
data=data.frame(diplay_url)
returns 错误:arguments imply differing number of rows: 4, 0, 3
我尝试了很多不同的方法,包括 this, this, and this, and even this——都无济于事——甚至 plyr。
Reshape2 几乎做到了(基于this)和@akrun(下):
library(reshape2)
nm1 <- names(url_expansions[[1]][[1]])
url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x)
data2 <- dcast(cbind(
coln = sequence(rapply(url_expansions, length)),
melt(url_expansions)), L1 + L2 ~ coln,
value.var = "value")
data3 <- data2[-(1:2)]
colnames(data3) <- nm1
但是,n>1 列表的子列表会被赋予新行,这导致新数据帧 (data3
) 的行数多于原始 url_expansions
。 :'(
最终,我需要将上面的每一行 display_url
加载到一个数据框中,连同其关联的 Twitter 数据 ala,因此尺寸必须匹配:
data=data.frame(trackback_author_name,content,highlight,display_url)
在此感谢所有帮助。示例数据包含在下面:
list(list(structure(c("http://t.co/anl8pGqwsy", "http://twinavi.jp/topics/news/52e9a184-e618-4979-ad98-045b5546ec81?ref=tweet",
"http://twme.jp/tnav/04h7", "twme.jp/tnav/04h7"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/vx32EOwyRI", "http://wirelesswire.jp/london_wave/201401310211.html",
"http://wirelesswire.jp/london_wave/201401310211.html", "wirelesswire.jp/london_wave/20…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/4trgO3HVmv",
"http://www.asahi.com/articles/ASG102VZWG10UTIL003.html", "http://t.asahi.com/dudj",
"t.asahi.com/dudj"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/5hnEwO5V1h",
"http://twinavi.jp/topics/news/52e9e820-9034-4edb-9b2c-195b5546ec81?ref=tweet",
"http://twme.jp/tnav/04hL", "twme.jp/tnav/04hL"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/GdMMXKsbY0", "http://www.riken.jp/pr/press/2014/20140130_1/",
"http://www.riken.jp/pr/press/2014/20140130_1/", "riken.jp/pr/press/2014/…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/7x21RTkgke",
"http://www.asahi.com/articles/ASG1Z0PGCG1YPLBJ00W.html", "http://t.asahi.com/dtxd",
"t.asahi.com/dtxd"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/Rcdl4L2zP1",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1nv8CdM",
"bit.ly/1nv8CdM"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/3E2HD1wylC",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"nikkansports.com/general/news/p…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/bIciCF7fJb",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363=",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363",
"dailynews.yahoo.co.jp/photograph/pic…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/dwQVkHlT3R",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://www.cdb.riken.jp/crp/news2014.1.31_2.html",
"cdb.riken.jp/crp/news2014.1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/HgtgZJID2w",
"http://www3.nhk.or.jp/news/html/20140130/k10014894611000.html",
"http://nhk.jp/N4Bg6FTZ", "nhk.jp/N4Bg6FTZ"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/R4dz0XI9ci", "http://pbs.twimg.com/media/BczUK5dIgAA4mDl.jpg",
"http://twitter.com/kokossu07/status/417942149267984384/photo/1",
"pic.twitter.com/R4dz0XI9ci"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/gP0bI68UEq",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1iTvtiy",
"bit.ly/1iTvtiy"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(c("", "", "", "")), list(structure(c("http://t.co/2X4PnkCWxo",
"http://dailynews.yahoo.co.jp/fc/science/stap_cells/?id=6105570",
"http://yahoo.jp/JDsgEr", "yahoo.jp/JDsgEr"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/20SqWMJFDG", "http://mainichi.jp/feature/news/20140130mog00m040009000c.html",
"http://goo.gl/xRRcCl", "goo.gl/xRRcCl"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(c("", "", "", "")), list(c("", "",
"", "")), list(structure(c("http://t.co/ey2KK8wKoC", "http://www.cdb.riken.jp/crp/index.html",
"http://www.cdb.riken.jp/crp/index.html", "cdb.riken.jp/crp/index.html"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
)), structure(c("http://t.co/7Dg7O4coDM", "http://azukichi.net/frame2/b-frame526.html",
"http://azukichi.net/frame2/b-frame526.html", "azukichi.net/frame2/b-frame…"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
))), list(structure(c("http://t.co/6Yl1UG459s", "http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"sp.mainichi.jp/select/news/20…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/MPbamQCCpq",
"http://www.cdb.riken.jp/crp/index.html", "http://www.cdb.riken.jp/crp/index.html",
"cdb.riken.jp/crp/index.html"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/JkdfeQFi5C",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"sankei.jp.msn.com/science/news/1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/Gf16StDW4d",
"http://www.yomiuri.co.jp/science/news/20140130-OYT1T00630.htm",
"http://bit.ly/1n11fHM", "bit.ly/1n11fHM"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/gRKf2GkPpK", "http://nosumi.exblog.jp/20296694/",
"http://htn.to/4M3wsg", "htn.to/4M3wsg"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/tgelOtTBg3",
"http://pbs.twimg.com/media/BfLvREpCQAANS8r.jpg", "http://twitter.com/ysmkwa/status/428667991308259329/photo/1",
"pic.twitter.com/tgelOtTBg3"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/7pXgNSmGx5",
"http://nosumi.exblog.jp/20296694/", "http://nosumi.exblog.jp/20296694/",
"nosumi.exblog.jp/20296694/"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(c("", "", "", "")), list(structure(c("http://t.co/X7I8DPjhi2",
"http://horikawad.hatenadiary.com/entry/2014/01/30/071830", "http://horikawad.hatenadiary.com/entry/2014/01/30/071830",
"horikawad.hatenadiary.com/entry/2014/01/…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url")))
扩展我的评论以包括空白行,我建议如下,假设 mylist
是对象:
mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4)))
x<-do.call(rbind,unlist(mylist,recursive=FALSE))
colnames(x)<-names(mylist[[c(1,1)]])
我正在尝试将嵌套列表 url_expansion
转换为要与其他相应属性匹配的数据框,作为扁平化的 table。
url_expansion
包含 最多 4 个列表:
.Names = c("url", "topsy_expanded_url", "expanded_url", "display_url")
理想情况下,每个都应成为一个列标题,并在适当的地方应用 NA/null。到目前为止,这与其他数据一起使用:
score <- sapply(tweets, function(x) x$score)
然而,由于 url_expansions
缺少某些行的数据,以下代码:
url_expansions <- sapply(tweets, function(x) x$url_expansions)
display_url <- sapply(url_expansions, function(x) x$diplay_url)
data=data.frame(diplay_url)
returns 错误:arguments imply differing number of rows: 4, 0, 3
我尝试了很多不同的方法,包括 this, this, and this, and even this——都无济于事——甚至 plyr。
Reshape2 几乎做到了(基于this)和@akrun(下):
library(reshape2)
nm1 <- names(url_expansions[[1]][[1]])
url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x)
data2 <- dcast(cbind(
coln = sequence(rapply(url_expansions, length)),
melt(url_expansions)), L1 + L2 ~ coln,
value.var = "value")
data3 <- data2[-(1:2)]
colnames(data3) <- nm1
但是,n>1 列表的子列表会被赋予新行,这导致新数据帧 (data3
) 的行数多于原始 url_expansions
。 :'(
最终,我需要将上面的每一行 display_url
加载到一个数据框中,连同其关联的 Twitter 数据 ala,因此尺寸必须匹配:
data=data.frame(trackback_author_name,content,highlight,display_url)
在此感谢所有帮助。示例数据包含在下面:
list(list(structure(c("http://t.co/anl8pGqwsy", "http://twinavi.jp/topics/news/52e9a184-e618-4979-ad98-045b5546ec81?ref=tweet",
"http://twme.jp/tnav/04h7", "twme.jp/tnav/04h7"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/vx32EOwyRI", "http://wirelesswire.jp/london_wave/201401310211.html",
"http://wirelesswire.jp/london_wave/201401310211.html", "wirelesswire.jp/london_wave/20…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/4trgO3HVmv",
"http://www.asahi.com/articles/ASG102VZWG10UTIL003.html", "http://t.asahi.com/dudj",
"t.asahi.com/dudj"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/5hnEwO5V1h",
"http://twinavi.jp/topics/news/52e9e820-9034-4edb-9b2c-195b5546ec81?ref=tweet",
"http://twme.jp/tnav/04hL", "twme.jp/tnav/04hL"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/GdMMXKsbY0", "http://www.riken.jp/pr/press/2014/20140130_1/",
"http://www.riken.jp/pr/press/2014/20140130_1/", "riken.jp/pr/press/2014/…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/7x21RTkgke",
"http://www.asahi.com/articles/ASG1Z0PGCG1YPLBJ00W.html", "http://t.asahi.com/dtxd",
"t.asahi.com/dtxd"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/Rcdl4L2zP1",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1nv8CdM",
"bit.ly/1nv8CdM"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/3E2HD1wylC",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"nikkansports.com/general/news/p…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/bIciCF7fJb",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363=",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363",
"dailynews.yahoo.co.jp/photograph/pic…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/dwQVkHlT3R",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://www.cdb.riken.jp/crp/news2014.1.31_2.html",
"cdb.riken.jp/crp/news2014.1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/HgtgZJID2w",
"http://www3.nhk.or.jp/news/html/20140130/k10014894611000.html",
"http://nhk.jp/N4Bg6FTZ", "nhk.jp/N4Bg6FTZ"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/R4dz0XI9ci", "http://pbs.twimg.com/media/BczUK5dIgAA4mDl.jpg",
"http://twitter.com/kokossu07/status/417942149267984384/photo/1",
"pic.twitter.com/R4dz0XI9ci"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/gP0bI68UEq",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1iTvtiy",
"bit.ly/1iTvtiy"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(c("", "", "", "")), list(structure(c("http://t.co/2X4PnkCWxo",
"http://dailynews.yahoo.co.jp/fc/science/stap_cells/?id=6105570",
"http://yahoo.jp/JDsgEr", "yahoo.jp/JDsgEr"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/20SqWMJFDG", "http://mainichi.jp/feature/news/20140130mog00m040009000c.html",
"http://goo.gl/xRRcCl", "goo.gl/xRRcCl"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(c("", "", "", "")), list(c("", "",
"", "")), list(structure(c("http://t.co/ey2KK8wKoC", "http://www.cdb.riken.jp/crp/index.html",
"http://www.cdb.riken.jp/crp/index.html", "cdb.riken.jp/crp/index.html"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
)), structure(c("http://t.co/7Dg7O4coDM", "http://azukichi.net/frame2/b-frame526.html",
"http://azukichi.net/frame2/b-frame526.html", "azukichi.net/frame2/b-frame…"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
))), list(structure(c("http://t.co/6Yl1UG459s", "http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"sp.mainichi.jp/select/news/20…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/MPbamQCCpq",
"http://www.cdb.riken.jp/crp/index.html", "http://www.cdb.riken.jp/crp/index.html",
"cdb.riken.jp/crp/index.html"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/JkdfeQFi5C",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"sankei.jp.msn.com/science/news/1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/Gf16StDW4d",
"http://www.yomiuri.co.jp/science/news/20140130-OYT1T00630.htm",
"http://bit.ly/1n11fHM", "bit.ly/1n11fHM"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/gRKf2GkPpK", "http://nosumi.exblog.jp/20296694/",
"http://htn.to/4M3wsg", "htn.to/4M3wsg"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/tgelOtTBg3",
"http://pbs.twimg.com/media/BfLvREpCQAANS8r.jpg", "http://twitter.com/ysmkwa/status/428667991308259329/photo/1",
"pic.twitter.com/tgelOtTBg3"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/7pXgNSmGx5",
"http://nosumi.exblog.jp/20296694/", "http://nosumi.exblog.jp/20296694/",
"nosumi.exblog.jp/20296694/"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(c("", "", "", "")), list(structure(c("http://t.co/X7I8DPjhi2",
"http://horikawad.hatenadiary.com/entry/2014/01/30/071830", "http://horikawad.hatenadiary.com/entry/2014/01/30/071830",
"horikawad.hatenadiary.com/entry/2014/01/…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url")))
扩展我的评论以包括空白行,我建议如下,假设 mylist
是对象:
mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4)))
x<-do.call(rbind,unlist(mylist,recursive=FALSE))
colnames(x)<-names(mylist[[c(1,1)]])