将 Firefox 书签 JSON 文件转换为 markdown

Convert Firefox bookmarks JSON file to markdown

背景

我想在我的 Hugo 网站上显示我的部分书签。 Firefox 的书签可以保存为 JSON 格式,这是来源。结果应该以某种方式以嵌套列表、树视图或手风琴的格式表示嵌套结构。网站内容源文件采用markdown格式编写。我想从 JSON 输入生成降价文件。

作为我 searched 可能的解决方案:

我选择从 JSON 生成无序列表。我想用 R.

做这个

Input/output

输入样本:https://gist.github.com/hermanp/c01365b8f4931ea7ff9d1aee1cbbc391

首选输出(缩进两个空格):

- Info
  - Python
    - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
    - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
    - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
    - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
  - Frontend
    - [CodePen](https://codepen.io/)
    - [JavaScript](https://www.javascript.com/)
    - [CSS-Tricks](https://css-tricks.com/)
    - [Butterick’s Practical Typography](https://practicaltypography.com/)
    - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
    - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
    - [Client-Side Web Development](https://info340.github.io/)
  - [Stack Overflow](https://whosebug.com/)
  - [HUP](https://hup.hu/)
  - [Hope in Source](https://hopeinsource.com/)

额外的首选输出:在 link 之前显示图标,如下所示(欢迎其他建议,例如从网站服务器加载它们而不是 linking):

  - ![https://cdn.sstatic.net/Sites/Whosebug/Img/apple-touch-icon.png?v=c78bd457575a][Stack Overflow](https://whosebug.com/)

尝试

generate_md <- function (file) {
  # Encoding problem with tidyjson::read_json
  bmarks_json_lite <- jsonlite::fromJSON(
    txt = paste0("https://gist.githubusercontent.com/hermanp/",
                 "c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
                 "33c21c88dad35145e2792b6258ede9c882c580ec/",
                 "bookmarks-example.json"))
  
  # This is the start point, a data frame
  level1 <- bmarks_json_lite$children$children[[2]]
  
  # Get the name of the variable to modify it.
  # Just felt that some abstraction needed.
  varname <- deparse(substitute(level1))
  varlevel <- as.integer(substr(varname, nchar(varname), nchar(varname)))
  
  # Get through the data frame by its rows.
  for (i in seq_len(nrow(get(varname)))) {
    
    # If the type of the element in the row is "text/x-moz-place",
    # then get its title and create a markdown list element from it.
    if (get(varname)["type"][i] == "text/x-moz-place"){
      
      # The two space indentation shall be multiplied as many times
      # as deeply nested in the lists (minus one).
      md_title <- paste0(strrep("  ", varlevel - 1),
                         "- ",
                         get(varname)["title"][i],
                         "\n")
    
      # Otherwise do this and also get inside the next level.
    } else if (get(varname)["type"][i] == "text/x-moz-place-container") {
      md_title <- paste0(strrep("  ", varlevel - 1),
                         "- ",
                         get(varname)["title"][i],
                         "\n")
      
      # I know this is not good, just want to express my thought.
      # Create the next, deeper level's variable, whoose name shall
      # represent the depth in the nest.
      # Otherwise how can I multiply the indentation for the markdown
      # list elements? It depends on the name of this variable.
      varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
                        varlevel + 1L)
      varlevel <- varlevel + 1L
      assign(varname, get(varname)["children"][[i]])
      
      # The same goes on as seen at the higher level.
      for (j in seq_len(nrow(get(varname)))){
        if (get(varname)["type"][i] == "text/x-moz-place"){
          md_title <- paste0(strrep("  ", varlevel - 1),
                             "- ",
                             get(varname)["title"][i],
                             "\n")
        } else if (get(varname)["type"][i] == "text/x-moz-place-container") {
          md_title <- paste0(strrep("  ", varlevel - 1),
                             "- ",
                             get(varname)["title"][i],
                             "\n")
          
          varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
                            varlevel + 1L)
          varlevel <- varlevel + 1L
          assign(varname, get(varname)["children"][[i]])
          
          for (k in seq_len(nrow(get(varname)))){
            # I don't know where this goes...
            # Also I need to paste somewhere the md_title strings to get the 
            # final markdown output...
          }
        }
      }
    }
  }
}

问题

如何从这个 JSON 文件中递归地抓取并粘贴字符串?我试图搜索递归技巧,但这是一个很难的话题。欢迎任何建议、包、功能,link!

在我观看了一些关于递归的视频并看到了一些代码示例之后,我尝试手动逐步执行代码并以某种方式设法使用递归。此解决方案与书签的嵌套性无关,因此是适用于所有人的通用解决方案。

注意:所有书签都在 Firefox 的书签工具栏中。这在 generate_md 函数中突出显示。你可以在那里解决它。如果我以后改进答案,我会让它更笼统。

library(jsonlite)

# This function recursively converts the bookmark titles to unordered
# list items.
recursive_func <- function (level) {
  md_result <- character()
  
  # Iterate through the current data frame, which may have a children
  # column nested with other data frames.
  for (i in seq_len(nrow(level))) {
    # If this element is a bookmark and not a folder, then grab
    # the title and construct a list item from it.
    if (level[i, "type"] == "text/x-moz-place"){
      md_title <- level[i, "title"]
      md_uri <- level[i, "uri"]
      md_iconuri <- level[i, "iconuri"]
      # Condition: the URLs all have schema (http or https) part.
      # If not, filname will be a zero length character vector.
      host_url <- regmatches(x = md_uri,
                             m = regexpr(pattern = "(?<=://)[[:alnum:].-]+",
                                         text = md_uri,
                                         perl = T))
      
      md_link <- paste0("[", md_title, "]", "(", md_uri, ")")
      md_listitem <- paste0("- ", md_link, "\n")
      
      # If this element is a folder, then get into it, call this
      # function over it. Insert two space (for indentation) in
      # the generated sting before every list item. Paste this
      # list of items to the folder list item.
    } else if (level[i, "type"] == "text/x-moz-place-container") {
      md_title <- level[i, "title"]
      md_listitem <- paste0("- ", md_title, "\n")
      md_recurs <- recursive_func(level = level[i, "children"][[1]])
      md_recurs <- gsub("(?<!(\w ))-(?= )", "  -", md_recurs, perl = T)
      md_listitem <- paste0(md_listitem, md_recurs)
    }
    
    # Collect and paste the list items of the current data frame.
    md_result <- paste0(md_result, md_listitem)
  }
  
  # Return the (sub)list of the data frame.
  return(md_result)
}

generate_md <- function (jsonfile) {
  # Encoding problem with tidyjson::read_json
  bmarks_json_lite <- fromJSON(txt = jsonfile)
  
  # This is the start point, a data frame. It represents the
  # elements inside the Bookmarks Toolbar in Firefox.
  level1 <- bmarks_json_lite$children$children[[2]]
  
  # Do not know how to make it prettier, but it works.
  markdown_result <- recursive_func(level = level1)
  
  return(markdown_result)
}

您可以 运行 generate_md 函数与示例。

generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
                   "c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
                   "33c21c88dad35145e2792b6258ede9c882c580ec/",
                   "bookmarks-example.json"))

# Output
[1] "- Info\n  - Python\n    - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)\n    - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)\n    - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)\n    - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)\n  - Frontend\n    - [CodePen](https://codepen.io/)\n    - [JavaScript](https://www.javascript.com/)\n    - [CSS-Tricks](https://css-tricks.com/)\n    - [Butterick’s Practical Typography](https://practicaltypography.com/)\n    - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)\n    - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)\n    - [Client-Side Web Development](https://info340.github.io/)\n  - [Stack Overflow](https://whosebug.com/)\n  - [HUP](https://hup.hu/)\n  - [Hope in Source](https://hopeinsource.com/)\n"

您可以 cat 它并使用 writeLines 将其写入文件。但是请注意!在 Windows 环境中,您可能需要转动 useBytes = TRUE 才能在文件中获取正确的字符。参考:UTF-8 file output in R

cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
                       "c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
                       "33c21c88dad35145e2792b6258ede9c882c580ec/",
                       "bookmarks-example.json")))
# Output
- Info
  - Python
    - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
    - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
    - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
    - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
  - Frontend
    - [CodePen](https://codepen.io/)
    - [JavaScript](https://www.javascript.com/)
    - [CSS-Tricks](https://css-tricks.com/)
    - [Butterick’s Practical Typography](https://practicaltypography.com/)
    - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
    - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
    - [Client-Side Web Development](https://info340.github.io/)
  - [Stack Overflow](https://whosebug.com/)
  - [HUP](https://hup.hu/)
  - [Hope in Source](https://hopeinsource.com/)

正则表达式部分有问题。如果书签的标题中有 some - title (space, hyphen, space) 字符,这些连字符也将作为列表项“缩进”。

# Input JSON
https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5

cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
                       "381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
                       "76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
                       "bookmarks-example-hyphen.json")))

# Output (two space indentation) markdown:
- Info
  - Python
    - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
    - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
    - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
    - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
  - Frontend
    - [CodePen](https://codepen.io/)
    - [JavaScript - Wikipedia](https://en.wikipedia.org/wiki/JavaScript)  # correct
    - [CSS-Tricks](https://css-tricks.com/)
    - [Butterick’s Practical Typography](https://practicaltypography.com/)
    - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
    - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
    - [Client-Side Web Development](https://info340.github.io/)
  - [Stack Overflow](https://whosebug.com/)
  - [HUP](https://hup.hu/)
  - [Hope in Source](https://hopeinsource.com/)

我发布了关于这个问题的 another question。经过一些提示和尝试,我回答了我自己的问题。

我知道您在 R 中寻求解决方案。 作为建议,这里有一个使用 jq 的解决方案,因为它非常适合 json 转换。

#!/bin/bash

BOOKMARKS='FirefoxBookmarks.json'

jq -r '
  def bookmark($iconuri; $title; $uri):
     if $iconuri != null then "![\($iconuri)]" else "" end +
     "[\($title)](\($uri))";

  def bookmarks:
    (objects | to_entries[]
     | if .value | type == "array" then (.value | bookmarks)
                                   else .value end ) //
    (arrays[] | [bookmarks] | " - \(.[0])", "  \(.[1:][])" );

  (.. | .children? | arrays)
    |= map(if .uri != null then {bookmark: bookmark(.iconuri; .title; .uri)}
                           else {title} end +
           {children})
  | del(..| select(length == 0))     # remove empty children and empty titles
  | del(..| select(length == 0))     # remove objects that got empty because of previous deletion
  | del(..| objects | select(has("title") and (.children | length == 0)))   # remove objects with title but no children
  | .children                        # remove root level
  | bookmarks
    ' < "$BOOKMARKS"

输出:

 - Könyvjelzők eszköztár
   - Info
     - Python
       - ![fake-favicon-uri:https://www.freecodecamp.org/news/the-python-guide-for-beginners/][The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
       - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
       - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
       - ![https://github.githubassets.com/favicons/favicon.svg][Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
     - Frontend
       - ![https://static.codepen.io/assets/favicon/favicon-touch-de50acbf5d634ec6791894eba4ba9cf490f709b3d742597c6fc4b734e6492a5a.png][CodePen](https://codepen.io/)
       - ![https://www.javascript.com/etc/clientlibs/pluralsight/main/images/favicons/android-chrome-192x192.png][JavaScript](https://www.javascript.com/)
       - ![https://css-tricks.com/apple-touch-icon.png][CSS-Tricks](https://css-tricks.com/)
       - [Butterick’s Practical Typography](https://practicaltypography.com/)
       - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
       - ![https://www.smashingmagazine.com/images/favicon/app-icon-512x512.png][Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
       - ![https://info340.github.io/img/busy-spider-icon.png][Client-Side Web Development](https://info340.github.io/)
     - ![https://cdn.sstatic.net/Sites/Whosebug/Img/apple-touch-icon.png?v=c78bd457575a][Stack Overflow](https://whosebug.com/)
     - ![https://hup.hu/profiles/hupper/themes/hup_theme/favicon.ico][HUP](https://hup.hu/)
     - [Hope in Source](https://hopeinsource.com/)