来自 JSON 响应的未知 pdf 编码

Unknown pdf encoding from JSON response

我有一个 API returns 来自 json 的 pdf,但它只是 returns 像下面的一长串整数

[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...



...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}

我的问题是:

  1. 这是什么编码?
  2. 如何使用 python 将其转换为 pdf?

P.S: Here 是获取完整响应的端点。

数据的开头暗示您实际上拥有 PDF 文件的字节值列表:它以 '%PDF-1.4'.

的字节值开头

所以你必须首先提取那个奇怪的字符串:

data = json_data[1]['data']

拥有:

"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."

首先将其转换为 int 列表,然后是字节字符串(i if i >=0 else i+256 确保正值...):

intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i+256 for i in intlist)

获得b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'

最后将其保存到文件中:

with open('file.pdf', 'wb') as fd:
    fd.write(b)