如何将从Dexador(库)获取的HTML转换为Common Lisp中的JSON?

How to convert the HTML obtained from Dexador (library) to JSON in Common Lisp?

我正在使用 Common Lisp (SBCL) 和 Dexador 库来做:

>(dex:get "http://www.paulgraham.com")

其中returns,首先是网页中来源HTML的字符串:

> (dex:get "http://www.paulgraham.com")

"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">
<html><script type=\"text/javascript\"> 
 <!-- 
 (new Image).src=\"https://store.yahoo.net/cgi-bin/refsd?e=http://www.paulgraham.com/&h=www.paulgraham.com&v=1.0&dr=\" + escape(document.referrer); 
 --> 
 </script>
<head><title>Paul Graham</title><!-- <META NAME=\"ROBOTS\" CONTENT=\"NOODP\"> -->
<link rel=\"shortcut icon\" href=\"http://ycombinator.com/arc/arc.png\">
</head><body bgcolor=ffffff background=\"https://sep.yimg.com/ca/I/paulgraham_2271_0\" text=000000 link=000099 vlink=464646><table border=0 cellspacing=0 cellpadding=0><tr valign=top><td><map name=5421620e8961b><area shape=rect coords=\"0,21,67,42\" href=\"articles.html\"><area shape=rect coords=\"0,42,67,63\" href=\"http://www.amazon.com/gp/product/0596006624\"><area shape=rect coords=\"0,63,67,84\" href=\"books.html\"><area shape=rect coords=\"0,84,67,105\" href=\"http://ycombinator.com\"><area shape=rect coords=\"0,105,67,126\" href=\"arc.html\"><area shape=rect coords=\"0,126,67,147\" href=\"bel.html\"><area shape=rect coords=\"0,147,67,168\" href=\"lisp.html\"><area shape=rect coords=\"0,168,67,189\" href=\"antispam.html\"><area shape=rect coords=\"0,189,67,210\" href=\"kedrosky.html\"><area shape=rect coords=\"0,210,67,231\" href=\"faq.html\"><area shape=rect coords=\"0,231,67,252\" href=\"raq.html\"><area shape=rect coords=\"0,252,67,273\" href=\"quo.html\"><area shape=rect coords=\"0,273,67,294\" href=\"rss.html\"><area shape=rect coords=\"0,294,67,315\" href=\"bio.html\"><area shape=rect coords=\"0,315,67,336\" href=\"https://twitter.com/paulg\"><area shape=rect coords=\"0,336,67,357\" href=\"ind.html\"><area shape=rect coords=\"0,357,67,378\" href=\"info.html\"></map><img src=\"https://s.yimg.com/aah/paulgraham/img-17.gif\" width=69 height=378 usemap=#5421620e8961b border=0 hspace=0 vspace=0 ismap></td><td><img src=\"https://sep.yimg.com/ca/Img/trans_1x1.gif\" height=1 width=26 border=0></td><td><font size=2 face=\"verdana\"><img src=\"https://sep.yimg.com/ca/I/paulgraham_2271_3232\" width=410 height=45 border=0 hspace=0 vspace=0><br><br><img src=\"https://sep.yimg.com/ay/paulgraham/index-1.gif\" width=410 height=308 border=0 hspace=0 vspace=0><br><br><table border=0 cellspacing=0 cellpadding=0 width=435><tr><td><font size=2 face=\"verdana\"><table width=410 cellspacing=0>
<tr><td bgcolor=#ffcc33><img src=\"http://www.virtumundo.com/images/spacer.gif\"
height=15 width=1><font size=2>
<b>New:</b> 
<a href=\"words.html\">Putting Ideas into Words</a> |
<a href=\"goodtaste.html\">Taste</a> |
<a href=\"smart.html\">Smart</a> |
<a href=\"weird.html\">Weird</a>
</font>
<br><img src=\"http://www.virtumundo.com/images/spacer.gif\" height=5 width=1></td
></tr>
</table>
<table width=410 cellspacing=0>
<tr><td bgcolor=#ff9922><img src=\"http://www.virtumundo.com/images/spacer.gif\"
height=15 width=1><font size=2>
<b>Want to start a startup?</b> Get funded by <a href=\"http://ycombinator.com/apply.html\">Y Combinator</a>.
</font>
<br><img src=\"http://www.virtumundo.com/images/spacer.gif\" height=5 width=1></td
></tr>
</table>
<!--
<table width=410 cellpadding=0 cellspacing=0>
<tr><td bgcolor=#ffcc33><img src=\"http://ycombinator.com/images/s.gif\"
height=15 width=1><font size=2>
<b><center><a href=\"http://arclanguage.org/install\">New Arc Out</a><b></center>
</font>
<br><img src=\"http://ycombinator.com/images/s.gif\" height=5 width=1></td
></tr>
-->
<!-- \"Paul Graham, like nobody else, tells us what it means to be a hacker.\"  - Matthias Felleisen--><br><br>
<!-- ffdd00 a7e4e2 ffcc33 ff9922, dcd7c8,ffcc70,ff7070, ccdd70, cad4dd, cad4ef, efea99, aaddcc, eeee88 eeee99 ccdcef
ffeebb,  fffbcc, ffac74, d9e4ff ccccff, ffcc50, wufoo bc3c1f, acd8b4, eebb50-->
<link rel=\"alternate\" type=\"application/rss+xml\" title=\"RSS\" href=\"http://www.aaronsw.com/2002/feeds/pgessays.rss\"></font></td></tr></table><br><table border=0 cellspacing=0 cellpadding=0 width=435><tr><td><font size=2 face=\"verdana\"><br>
<font size=1>
<font color=#cccccc>
&copy; mmxxii pg</font> <!--
<font color=#777777><a href=\"http://snipshot.com\">
<font color=#7777dd>photos edited with snipshot</font></a>.
</font></font> -->
<!--
<img src=\"https://sep.yimg.com/ty/cdn/paulgraham/obama.jpg?t=1645119889&\" height=30 width=90>
-->
<!--
<a href=\"http://www.xobni.com/?friend=3D2061\" target=\"_blank\"><img src=\"http://www.xobni.com/images/banners/formyinbox_ffffff.gif\" alt=\"Xobni outlook add-in for your inbox\" border=0/></a>
-->
<!--
<a href=\"http://technorati.com/claim/h9c4r84rfd\" rel=\"me\"><font color=#ffffff>Technorati Profile</font></a> --></font></td></tr></table><br></font></td></tr></table></body>
<script type=\"text/javascript\">
csell_env = 'bf1';
 var storeCheckoutDomain = 'order.store.yahoo.net';
</script>
</html>"
200
#<HASH-TABLE :TEST EQUAL :COUNT 11 {100250F743}>
#<QURI.URI.HTTP:URI-HTTP http://www.paulgraham.com>
#<SB-SYS:FD-STREAM for "socket 172.20.10.5:60096, peer: 74.6.52.135:80" {1001E3B963}>

第一个结果是一个字符串,由谓词证明:

>(stringp (dex:get "http://www.paulgraham.com"))
T

如何将此源 HTML 字符串转换为 JSON 字符串?

这可能不是很优雅,但它确实有效:

(jsown:pretty-json
       (dex:post "https://html2json.com/api/v1"
                 :content (dex:get "http://www.paulgraham.com")))

此代码片段使用了两次 Dexador。首先,做问题中提到的 dex:get 请求。此外,它正在向免费提供的网络 API 发出 dex:post 请求,将 HTML 转换为 JSON。最后,它使用了另一个名为 jsown-utils 的库,它基本上漂亮地打印了 JSON.