在 R 中,使用模式将字符 object 转换为列表或数据框以提取列名和值

In R, convert character object to list or dataframe using pattern to extract colnames and values

我有一个书名和作者列表,我正在使用 Google 本书 API 来访问关于这些书的附加信息(例如完整的书名、ISBN 等)最终,仅当 Google 返回的第一项的作者姓名字段包含原始列表中作者姓名中的姓名时,我才想将 Google 中的信息复制到我的原始列表中。

我的问题是关于是否有一种简单的方法可以根据 [=66] 中的模式将查询结果(字符 object)转换为 table 或数据帧=] 结果。下面是一个例子。

library(RCurl)
result<-getURL("https://www.googleapis.com/books/v1/volumes?q=fellowship%20of%20the%20ring%20tolkien&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)

print(result)

这导致了这个结果:

[1] "{\n \"kind\": \"books#volumes\",\n \"totalItems\": 717,\n \"items\": [\n { \n \"kind\": \"books#volume\",\n \"id\": \"aWZzLPhY4o0C\",\n \"etag\": \"UKfRIR+5nhY\",\n \"selfLink\": \"https://www.googleapis.com/books/v1/volumes/aWZzLPhY4o0C\",\n \"volumeInfo\": {\n \"title\": \"The Fellowship of the Ring\",\n \"subtitle\": \"Being the First Part of The Lord of the Rings\",\n \"authors\": [\n \"J.R.R. Tolkien\"\n ],\n \"publisher\": \"Houghton Mifflin Harcourt\",\n \ "publishedDate\": \"2012-02-15\",\n \"description\": \"J.R.R第一卷。托尔金史诗般的冒险 指环王 一戒统治所有人,一戒寻找他们,一戒带来所有人并在黑暗中束缚他们 在远古时代,力量之戒是由 Elven-smiths,黑魔王索伦锻造了至尊魔戒,并用自己的力量填充了它,以便他可以统治所有其他人。但是 One Ring 从他身上被夺走了,尽管他在整个中部都在寻找它...

我想将生成的字符 object 转换为列表或 table 或数据框,并且在大多数情况下,

column names enclosed in " ", preceded on the left by a line return \n, and followed by ":" on the right row values enclosed in " ", preceded on the left by ": ", and follwed ",\n" on the right

但有些字段(如 ISBN)并未完全遵循该模式。

例如,我希望 result.df 看起来像:

kind    title   subtitle    authors publisher   publishedDate description   ISBN_13 ISBN_10
"books#volume"  "The Fellowship of the Ring" "Being the First Part of The Lord of the Rings" 
 "J.R.R. Tolkien" "Houghton Mifflin Harcourt" "2012-02-15" "The first volume in J.R.R. Tolkien's epic adventure THE LORD OF THE RINGS One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In ancient times the Rings of Power were crafted by the Elven-smiths, and Sauron, the Dark Lord, forged the One Ring, filling it with his own power so that he could rule all others. But the One Ring was taken from him, and though he sought it throughout Middle-earth, it remained lost to him. After many ages it fell into the hands of Bilbo Baggins, as told in The Hobbit. In a sleepy village in the Shire, young Frodo Baggins finds himself faced with an immense task, as his elderly cousin Bilbo entrusts the Ring to his care. Frodo must leave his home and make a perilous journey across Middle-earth to the Cracks of Doom, there to destroy the Ring and foil the Dark Lord in his evil purpose. “A unique, wholly realized other world, evoked from deep in the well of Time, massively detailed, absorbingly entertaining, profound in meaning.” – New York Times" "9780547952017" "0547952015"

最终,如果某些值匹配(例如,作者的值包括与另一个数据框中的值的匹配),我希望能够将值从新 list/table/dataframe 复制到另一个数据框,类似于循环摘录:

if(grepl(books$auth1last[i],result.df$authors[1])==TRUE){
    books$isbn13[i] = result.df$isbn13[1] 
}else{
    books$isbn13[i] = NA} 

是否有一种优雅的方法可以将字符 object 转换为更像是仅需几行的有组织的 list/table/df 的东西,或者我是否必须使用单独的列提取每个列名称和值使用类似 rm_between 的行?谢谢!

您可以使用jsonlite 包将json 的返回字符串转换为列表。您只需删除换行符即可使用。

示例:

library(RCurl)
result <- getURL("https://www.googleapis.com/books/v1/volumes?q=fellowship%20of%20the%20ring%20tolkien&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)

result_no_breaks <- gsub("\n", " ",result)
json_list <- jsonlite::fromJSON(result_no_breaks)