将 PDF 文件作为资源上传到 CKAN 数据集失败,并显示“{file} is not json serializable”

Uploading a PDF file to a CKAN Dataset as a resource fails with "{file} is not json serializable"

我用于创建数据集并向该数据集添加单个 PDF 文件作为资源的简单 python 脚本失败并显示“{file} 不是 json 可序列化”。

# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni

# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'

endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}

payload_p = {
    "name": "test01",
    "private": "true",
    "state": "active",
    "owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
    "maintainer" : "Forms Management",
    "maintainer_email" : "forms.management@province.ca",
    "author" : "Test Author",
    "author_email" : "hughj@province.ca"
}

payload_r = {
    "package_id": "null",
    "name": "English - test01 - Test Description",
    "url": "upload",
    "upload": open('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf', 'r'),
    "description": "This is a test resource attached to dataset test01",
    "notes": "This is a longer block of text that is for the resource test01e which is attached to the dataset test01"
}
      
filepaths = {
    "thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
        
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)

theLastResponse = req_p.json()

theLastPackageCreated = theLastResponse['result']['id']

payload_r["package_id"] = theLastPackageCreated

req_r = requests.post(endpoint_r, json = payload_r, headers = headers) # resource_create()

这会引发错误“{file} 不是 json 可序列化”。该文件是一个 PDF,它是一个二进制文件,但我不确定是否需要某种类型的编码(请注意注释掉的“base64”模块......我不想不问是否就走那条路这是正确的方法。)

CKAN API 文档在这里: https://docs.ckan.org/en/2.9/api/#ckan.logic.action.create.resource_create

说“上传”应该是“(FieldStorage (optional) needs multipart/form-data) – (optional)”但是我看到的所有将文件上传到 CKAN 的示例脚本只显示了代码以及我在这里所做的,没有对正在上传的文件进行额外的预处理或其他,所以我不确定到底是什么问题...如果可以请帮忙!

我复制了你的代码和 运行 针对 CKAN 的本地开发副本的修改版本,并且在我的 mod 之后正常工作,包括在下面。

最值得注意的是:

  • payload_r -> 所有这些额外的东西都不需要,但如果需要,您可以包括其他资源元数据,例如描述、名称等
  • req_r -> 1) 在此处将有效负载作为 data 传递,而不是 json,因为它是 multipart-form-data。 2) 在此处发送带有 files 参数的文件。

文档:https://docs.ckan.org/en/2.9/maintaining/filestore.html#filestore-api

IMO 这与其说是 CKAN 问题,不如说是对所选库(即请求)的理解。使用不同的工具有很多方法可以做到这一点。

我还必须更新有效负载以与我的架构保持一致,但假设这对您来说是正确的,这应该可行。

# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni

# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'

endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}

payload_p = {
    "name": "test01",
    "private": "true",
    "state": "active",
    "owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
    "maintainer" : "Forms Management",
    "maintainer_email" : "forms.management@province.ca",
    "author" : "Test Author",
    "author_email" : "hughj@province.ca"
}

payload_r = {
    "package_id": "null"
}
      
filepaths = {
    "thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
        
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)

theLastResponse = req_p.json()

theLastPackageCreated = theLastResponse['result']['id']

payload_r["package_id"] = theLastPackageCreated

req_r = requests.post(endpoint_r, data=payload_r, headers=headers, files=[('upload', file('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf'))]) # resource_create()