将具有不均匀或缺失行的嵌套命名列表转换并展平为 R 中的单个数据框

Converting and flattening nested, named list with uneven or missing rows to single dataframe in R

我正在尝试将嵌套列表 url_expansion 转换为要与其他相应属性匹配的数据框,作为扁平化的 table。

url_expansion 包含 最多 4 个列表:

.Names = c("url", "topsy_expanded_url", "expanded_url", "display_url")

理想情况下,每个都应成为一个列标题,并在适当的地方应用 NA/null。到目前为止,这与其他数据一起使用:

score <- sapply(tweets, function(x) x$score)

然而,由于 url_expansions 缺少某些行的数据,以下代码:

url_expansions <- sapply(tweets, function(x) x$url_expansions)
display_url <- sapply(url_expansions, function(x) x$diplay_url)
data=data.frame(diplay_url)

returns 错误:arguments imply differing number of rows: 4, 0, 3

我尝试了很多不同的方法,包括 this, this, and this, and even this——都无济于事——甚至 plyr。

Reshape2 几乎做到了(基于this)和@akrun(下):

    library(reshape2)
nm1 <- names(url_expansions[[1]][[1]]) 
url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x) 
data2 <- dcast(cbind(
  coln = sequence(rapply(url_expansions, length)), 
  melt(url_expansions)), L1 + L2 ~ coln, 
  value.var = "value")
data3 <- data2[-(1:2)] 
colnames(data3) <- nm1

但是,n>1 列表的子列表会被赋予新行,这导致新数据帧 (data3) 的行数多于原始 url_expansions。 :'(

最终,我需要将上面的每一行 display_url 加载到一个数据框中,连同其关联的 Twitter 数据 ala,因此尺寸必须匹配:

data=data.frame(trackback_author_name,content,highlight,display_url)

在此感谢所有帮助。示例数据包含在下面:

list(list(structure(c("http://t.co/anl8pGqwsy", "http://twinavi.jp/topics/news/52e9a184-e618-4979-ad98-045b5546ec81?ref=tweet", 
"http://twme.jp/tnav/04h7", "twme.jp/tnav/04h7"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/vx32EOwyRI", "http://wirelesswire.jp/london_wave/201401310211.html", 
    "http://wirelesswire.jp/london_wave/201401310211.html", "wirelesswire.jp/london_wave/20…"
    ), .Names = c("url", "topsy_expanded_url", "expanded_url", 
    "display_url"))), list(structure(c("http://t.co/4trgO3HVmv", 
"http://www.asahi.com/articles/ASG102VZWG10UTIL003.html", "http://t.asahi.com/dudj", 
"t.asahi.com/dudj"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/5hnEwO5V1h", 
"http://twinavi.jp/topics/news/52e9e820-9034-4edb-9b2c-195b5546ec81?ref=tweet", 
"http://twme.jp/tnav/04hL", "twme.jp/tnav/04hL"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/GdMMXKsbY0", "http://www.riken.jp/pr/press/2014/20140130_1/", 
    "http://www.riken.jp/pr/press/2014/20140130_1/", "riken.jp/pr/press/2014/…"
    ), .Names = c("url", "topsy_expanded_url", "expanded_url", 
    "display_url"))), list(structure(c("http://t.co/7x21RTkgke", 
"http://www.asahi.com/articles/ASG1Z0PGCG1YPLBJ00W.html", "http://t.asahi.com/dtxd", 
"t.asahi.com/dtxd"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/Rcdl4L2zP1", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1nv8CdM", 
"bit.ly/1nv8CdM"), .Names = c("url", "topsy_expanded_url", "expanded_url", 
"display_url"))), list(structure(c("http://t.co/3E2HD1wylC", 
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html", 
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html", 
"nikkansports.com/general/news/p…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(structure(c("http://t.co/bIciCF7fJb", 
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363=", 
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363", 
"dailynews.yahoo.co.jp/photograph/pic…"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(structure(c("http://t.co/dwQVkHlT3R", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://www.cdb.riken.jp/crp/news2014.1.31_2.html", 
"cdb.riken.jp/crp/news2014.1…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/HgtgZJID2w", 
"http://www3.nhk.or.jp/news/html/20140130/k10014894611000.html", 
"http://nhk.jp/N4Bg6FTZ", "nhk.jp/N4Bg6FTZ"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/R4dz0XI9ci", "http://pbs.twimg.com/media/BczUK5dIgAA4mDl.jpg", 
    "http://twitter.com/kokossu07/status/417942149267984384/photo/1", 
    "pic.twitter.com/R4dz0XI9ci"), .Names = c("url", "topsy_expanded_url", 
    "expanded_url", "display_url"))), list(structure(c("http://t.co/gP0bI68UEq", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1iTvtiy", 
"bit.ly/1iTvtiy"), .Names = c("url", "topsy_expanded_url", "expanded_url", 
"display_url"))), list(c("", "", "", "")), list(structure(c("http://t.co/2X4PnkCWxo", 
"http://dailynews.yahoo.co.jp/fc/science/stap_cells/?id=6105570", 
"http://yahoo.jp/JDsgEr", "yahoo.jp/JDsgEr"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/20SqWMJFDG", "http://mainichi.jp/feature/news/20140130mog00m040009000c.html", 
    "http://goo.gl/xRRcCl", "goo.gl/xRRcCl"), .Names = c("url", 
    "topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(c("", "", "", "")), list(c("", "", 
"", "")), list(structure(c("http://t.co/ey2KK8wKoC", "http://www.cdb.riken.jp/crp/index.html", 
"http://www.cdb.riken.jp/crp/index.html", "cdb.riken.jp/crp/index.html"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
)), structure(c("http://t.co/7Dg7O4coDM", "http://azukichi.net/frame2/b-frame526.html", 
"http://azukichi.net/frame2/b-frame526.html", "azukichi.net/frame2/b-frame…"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
))), list(structure(c("http://t.co/6Yl1UG459s", "http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html", 
"http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html", 
"sp.mainichi.jp/select/news/20…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/MPbamQCCpq", 
"http://www.cdb.riken.jp/crp/index.html", "http://www.cdb.riken.jp/crp/index.html", 
"cdb.riken.jp/crp/index.html"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/JkdfeQFi5C", 
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm", 
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm", 
"sankei.jp.msn.com/science/news/1…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(structure(c("http://t.co/Gf16StDW4d", 
"http://www.yomiuri.co.jp/science/news/20140130-OYT1T00630.htm", 
"http://bit.ly/1n11fHM", "bit.ly/1n11fHM"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/gRKf2GkPpK", "http://nosumi.exblog.jp/20296694/", 
    "http://htn.to/4M3wsg", "htn.to/4M3wsg"), .Names = c("url", 
    "topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(structure(c("http://t.co/tgelOtTBg3", 
"http://pbs.twimg.com/media/BfLvREpCQAANS8r.jpg", "http://twitter.com/ysmkwa/status/428667991308259329/photo/1", 
"pic.twitter.com/tgelOtTBg3"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/7pXgNSmGx5", 
"http://nosumi.exblog.jp/20296694/", "http://nosumi.exblog.jp/20296694/", 
"nosumi.exblog.jp/20296694/"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(c("", "", "", "")), list(structure(c("http://t.co/X7I8DPjhi2", 
"http://horikawad.hatenadiary.com/entry/2014/01/30/071830", "http://horikawad.hatenadiary.com/entry/2014/01/30/071830", 
"horikawad.hatenadiary.com/entry/2014/01/…"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))) 

扩展我的评论以包括空白行,我建议如下,假设 mylist 是对象:

 mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4)))
 x<-do.call(rbind,unlist(mylist,recursive=FALSE))
 colnames(x)<-names(mylist[[c(1,1)]])