如何在 Nim 中解析 JSON(长整数问题)?

How to parse JSON in Nim (problem with long integers)?

我正在 Nim 中编写一段代码,从 Shodan API 中提取一个 JSON 对象。这是来自 Shodan 的完整 JSON 字符串:

{"city": "Alverca", "region_code": "14", "os": null, "tags": ["self-signed"], "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "area_code": null, "dma_code": null, "last_update": "2019-11-01T17:56:18.470438", "country_code3": "PRT", "country_name": "Portugal", "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "postal_code": "2619-510", "longitude": -9.038600000000002, "country_code": "PT", "ip_str": "85.139.242.90", "latitude": 38.899, "org": "ZON Tv Cabo", "data": [{"_shodan": {"id": null, "options": {}, "ptr": true, "module": "telnet", "crawler": "82488cbcb7dd25da13f728d04775390417d9ee4e"}, "hash": 1329569225, "os": null, "opts": {"telnet": {"will": ["SGA", "STATUS", "ECHO"], "do": ["TTYPE", "TSPEED", "XDISPLOC", "NEW_ENVIRON", "ECHO", "NAWS", "LFLOW"], "dont": [], "wont": []}}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 23, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-11-01T17:56:18.470438", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "\r\nBODET PUNCHING BOARD\r\nLinux/ppc 2.4.20_mvl31-BODET_V1.1B2\r\n\r\nWelcome to 172.17.30.99\r\nFri Nov  1 17:53:58 2019\r\nTech-code: ", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "7afc2cf1-2b4a-4074-9343-cd576d240364", "options": {}, "ptr": true, "module": "https", "crawler": "0636e1e6dd371760aeaf808ed839236e73a9e74d"}, "hash": 1484578305, "os": null, "tags": ["self-signed"], "opts": {"vulns": [], "heartbleed": "2019/10/31 06:28:03 85.139.242.90:443 - SAFE\n"}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "http": {"html_hash": -468632088, "robots_hash": null, "redirects": [], "securitytxt": null, "title": "", "sitemap_hash": null, "robots": null, "favicon": null, "host": "85.139.242.90", "html": "<!DOCTYPE html>\n<html>\n<head>\n<meta charset=\"UTF-8\">\n<title></title>\n</head>\n<body>\n<script>location.href = \"./home/index.html\";</script>\n</body>\n</html>", "location": "/", "components": {}, "server": null, "sitemap": null, "securitytxt_hash": null}, "port": 443, "ssl": {"dhparams": null, "tlsext": [{"id": 65281, "name": "renegotiation_info"}], "versions": ["TLSv1", "-SSLv2", "-SSLv3", "TLSv1.1", "TLSv1.2", "-TLSv1.3"], "acceptable_cas": [], "cert": {"sig_alg": "sha256WithRSAEncryption", "issued": "20170520021607Z", "expires": "20250806021607Z", "expired": false, "version": 2, "extensions": [{"data": "\x03\x02\x01\xa6", "name": "keyUsage"}, {"critical": true, "data": "0\x03\x01\x01\xff", "name": "basicConstraints"}, {"data": "\x16\x1fSelf Signed Certificate(System)", "name": "nsComment"}, {"data": "0\x10\x82\x0eXC8f45bb.local", "name": "subjectAltName"}], "fingerprint": {"sha256": "317aadb5fb5ddaf97232cdfb8c4a8da23d2f3f11f7229f028235f6545d08ef1f", "sha1": "3d2a2dcdb25b76b3ddddc740c2e4660ff07009d5"}, "serial": 46474876880932987910930945182556062189, "subject": {"CN": "XC-8F45BB"}, "pubkey": {"type": "rsa", "bits": 2048}, "issuer": {"CN": "XC-8F45BB"}}, "cipher": {"version": "TLSv1/SSLv3", "bits": 256, "name": "AES256-SHA256"}, "chain": ["-----BEGIN CERTIFICATE-----\nMIIDHTCCAgWgAwIBAgIQIva8VyQosXPBl/OnC/WB7TANBgkqhkiG9w0BAQsFADAU\nMRIwEAYDVQQDEwlYQy04RjQ1QkIwHhcNMTcwNTIwMDIxNjA3WhcNMjUwODA2MDIx\nNjA3WjAUMRIwEAYDVQQDEwlYQy04RjQ1QkIwggEiMA0GCSqGSIb3DQEBAQUAA4IB\nDwAwggEKAoIBAQC7ypFTTvDMJ0wYR0LGFOOJf/g6CyRFqJvAmtY0SZKw8EOXC365\n+ajGtJQ0qcsOqmFEFUmC5J0dUsuljbkqECx9cnVtXLtUUQ8pPfTz7Tphz+0zB/KS\nbG7NdrjWbHhVikPLCMrna6cxbI+d1vWA9NoLty02x1fpR8MH9SEqHlO89KbPaDwo\nmw6gjwNS+ImBnF6yzfslUQkcR3J3KGfCrNWsP+mYl7yx4+Awk3wI6vwkUpWmJX+T\nTEUV8rrTSyrHocc7hDYTN/bg5FgUsMLwuuHkEg+JzBTEmdVp0mI0Liq9B/hoVpKz\niX1si/yYkdqKQgNP4SALOqFdmB0+nkqN7rYzAgMBAAGjazBpMAsGA1UdDwQEAwIB\npjAPBgNVHRMBAf8EBTADAQH/MC4GCWCGSAGG+EIBDQQhFh9TZWxmIFNpZ25lZCBD\nZXJ0aWZpY2F0ZShTeXN0ZW0pMBkGA1UdEQQSMBCCDlhDOGY0NWJiLmxvY2FsMA0G\nCSqGSIb3DQEBCwUAA4IBAQCC8CGt0dtiRn6e79Rtjpr383RJdk2d8VfFbQSWj0Ct\nzZUdgktJiQR9+cNKYCoHvJ8E4mm1sb+Wgz2/CrP+7J8ZNRsb8UOabwrREeBvz0wl\nwiIwmrnuCYKZ8AMIEI4f3BmXVSz5baIFTHWWCuS22np5jz8bpYYKLIK4Pc6r+sEf\nfhd7H6YAPEPqAMlC/UTicDmXHKqKbLFDTHNyKiouO3DGFqpNDd4zOWsyDrHkbl91\nVAk6xEPha5Y0QyIlpkfcIAG0e/VxgzMxfiGPSV2kxgaVq+wbNq9T61GsXZ4ZD00L\nj8Q+YW28opH0OZe1h1V8uTytGnKnt295Z1X7hEae04XQ\n-----END CERTIFICATE-----\n"], "alpn": []}, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-31T05:27:57.891394", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 153\r\nX-Frame-Options: SAMEORIGIN\r\nX-Content-Type-Options: nosniff\r\nX-XSS-Protection: 1; mode=block\r\n\r\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "921aea7c-4258-40f4-90b0-73088269f39b", "options": {}, "ptr": true, "module": "rsync", "crawler": "339d3eded941e01ca426596e93f3fdf4c9346ccd"}, "product": "rsyncd", "hash": 1601166835, "version": "26", "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "os": null, "rsync": {"authentication": false, "modules": {"punching": "Punching home", "root": "Root filesystem"}}, "port": 873, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-30T12:11:50.048579", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "@RSYNCD: 26\nroot           \tRoot filesystem\npunching       \tPunching home\n@RSYNCD: EXIT\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": null, "options": {}, "ptr": true, "module": "whois", "crawler": "122dd688b363c3b45b0e7582622da1e725444808"}, "hash": -1288910848, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 43, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-28T18:52:53.093633", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "676478697\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": "99dd6dfe-b491-4691-8b62-c8957bb045e2", "options": {}, "ptr": true, "module": "http-simple-new", "crawler": "122dd688b363c3b45b0e7582622da1e725444808"}, "hash": 1240885964, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "http": {"html_hash": 1670855880, "robots_hash": null, "redirects": [], "securitytxt": null, "title": "Identification", "sitemap_hash": null, "robots": null, "favicon": null, "host": "85.139.242.90", "html": "<html>\r<head>\r<title>Identification</title>\r<meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1'>\r<script>\rvar clicable = false;\rdocument.oncontextmenu = menuContextuelHandler;\rfunction menuContextuelHandler(){event.srcElement.click();return false;}\rfunction valider(arg){\rif(clicable){\rdocument.getElementById('nomMethode').value=arg;\rdocument.forms[0].submit();}clicable=false;}\rfunction loadBody(){clicable = true;\rtry{init();}catch(e){};}\rfunction doBlink(elt){\rwindow.setInterval(function(){showHide(elt);}, 1000)}\rfunction showHide(elt){if (elt){\relt.style.visibility = (elt.style.visibility == \"hidden\") ? \"visible\" : \"hidden\";}}\r</script>\r</head>\r<body onload='loadBody();'  onclick='return clicable;' id='corps'>\r<form action=Login.do method=post name='formulaire'>\r<input type='hidden' id='nomMethode' name='nomMethode' value='MainPage'>\r<input type='hidden' id='sessionId' name='sessionId' value='1571047569808'>\r<table style='border:1px solid #000000;width:100%;text-align:center'>\r<tr><td style='width:20%;text-align:left'><b>&nbsp;&nbsp;19/10/2019 10:36:05</b>\r</td><td style='width:60%;text-align:center'><i>\rKelio visio : <b><font color=blue>Kelio Visio Lavradio</font></b><font color=green> 85.139.242.90</font></i></td><td style='width:20%;text-align:right'><img src='bodet.png' align=right></td></tr><tr><td colspan=3 style='width:100%;text-align:center'><h2>Identification\r</h2></td></tr></table>\r<table style=\"width:50%\"><tr><td style=\"width:50%;text-align:center\">\r</td><td style=\"width:50%;text-align:center\">\r</td><td style=\"width:50%;text-align:center\">\r</td></tr></table><br>\r<br><br><br><br><br><br>\r<div style=\"width:60%;text-align:right\">\r<h2><img src=\"password.png\">\r&nbsp;&nbsp;Login:&nbsp;&nbsp;&nbsp;&nbsp;<input type=\"password\" name=\"password\"/>\r<script type='text/javascript'>document.formulaire.password.focus();</script>\r<input type=submit name=\"OK\" value=\"OK\" onClick=javascript:valider(\"MainPage\"); style=\"color:#000000;background-color:#CCCCCC\">\r</h2></div>\r</form>\r<br><br><br><table width=100% border=0><tr><td><h6>\r</h6></td></tr></table>\r</body>\r</html>\r", "location": "/", "components": {}, "server": null, "sitemap": null, "securitytxt_hash": null}, "port": 8008, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-19T09:36:09.751093", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 2117\r\n\r\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}, {"_shodan": {"id": null, "options": {}, "ptr": true, "module": "line-printer-daemon", "crawler": "f7946cbe2dc20c40fcbcb81ad90aa01731b690ab"}, "hash": -372273874, "os": null, "opts": {}, "ip": 1435234906, "isp": "Nos Comunicacoes, S.A.", "port": 515, "hostnames": ["a85-139-242-90.static.cpe.netcabo.pt"], "location": {"city": "Alverca", "region_code": "14", "area_code": null, "longitude": -9.038600000000002, "country_code3": "PRT", "country_name": "Portugal", "postal_code": "2619-510", "dma_code": null, "country_code": "PT", "latitude": 38.899}, "timestamp": "2019-10-13T12:06:45.139731", "domains": ["netcabo.pt"], "org": "ZON Tv Cabo", "data": "no entries\n", "asn": "AS2860", "transport": "tcp", "ip_str": "85.139.242.90"}], "asn": "AS2860", "ports": [23, 443, 873, 43, 8008, 515]}

处理 API 接口的所有代码都工作正常,但我在解析生成的 JSON 对象时遇到问题。当对象很简单时,Nim 的解析器工作正常,但是当我试图解析上面的 JSON 时,我得到一个错误。用于解析 JSON 的 Nim 代码是:

let jsonRsp = parseJson(rspJson)

而且,编译器产生的错误是:

/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(870) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(862) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(829) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(820) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/json.nim(797) parseJson
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/strutils.nim(1107) parseBiggestInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(447) parseBiggestInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(423) rawParseInt
/home/nxl4/.choosenim/toolchains/nim-1.0.2/lib/pure/parseutils.nim(401) integerOutOfRangeError
Error: unhandled exception: Parsed integer outside of valid range [ValueError]

我明白了错误的意思,其中一个整数对于解析器来说太长了。由于我无法更改数据(无论 API 吐出什么),我想看看是否有人有在 Nim 中解析这种 JSON 数据的策略。除了编译器的抱怨之外,所有其他 JSON 验证器都将字符串显示为有效 JSON.

这是违规字段:

"serial": 46474876880932987910930945182556062189

大于 2^64。这很棘手,请参阅 JSON integers: limit on size

我通过三个不同的 JSON formatters/validators 为您的示例 JSON 提供了数据,并且在它通过验证时,验证器还将上面的整数转换为浮点值,从而丢失了有效数字进行中。即 formatted/validated 结果与原始结果不同。

在 Safari 和 Firefox JS 控制台上:

JSON.parse("{\"serial\": 46474876880932987910930945182556062189}") {serial: 4.647487688093299e+37}

所以一些解析器默默地将那个大整数转换为不同的数字。我对这种行为的直接反应是默默地失去精度比报告错误更糟糕。我在这里看到三个问题:

  1. 解析器默默地失去精度。
  2. JSON 的发出没有考虑到即使是流行的网络浏览器中的解析器也无法在不损失精度的情况下解析它。
  3. Nim 的 JSON 解析器可能不支持任意大整数。

第一个是三个 IMO 中最差的一个,但它不会消失。 Shodan API 的互操作性可以通过将序列号作为字符串而不是大整数来提高。您可以在 Nim 的问题跟踪器上报告该问题以供考虑。例如,Python 的 JSON 解析器会在不损失精度的情况下解析该特定整数。