将 PDF 文件作为资源上传到 CKAN 数据集失败,并显示“{file} is not json serializable”
Uploading a PDF file to a CKAN Dataset as a resource fails with "{file} is not json serializable"
我用于创建数据集并向该数据集添加单个 PDF 文件作为资源的简单 python 脚本失败并显示“{file} 不是 json 可序列化”。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01",
"private": "true",
"state": "active",
"owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
"maintainer" : "Forms Management",
"maintainer_email" : "forms.management@province.ca",
"author" : "Test Author",
"author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null",
"name": "English - test01 - Test Description",
"url": "upload",
"upload": open('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf', 'r'),
"description": "This is a test resource attached to dataset test01",
"notes": "This is a longer block of text that is for the resource test01e which is attached to the dataset test01"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r, json = payload_r, headers = headers) # resource_create()
这会引发错误“{file} 不是 json 可序列化”。该文件是一个 PDF,它是一个二进制文件,但我不确定是否需要某种类型的编码(请注意注释掉的“base64”模块......我不想不问是否就走那条路这是正确的方法。)
CKAN API 文档在这里:
https://docs.ckan.org/en/2.9/api/#ckan.logic.action.create.resource_create
说“上传”应该是“(FieldStorage (optional) needs multipart/form-data) – (optional)”但是我看到的所有将文件上传到 CKAN 的示例脚本只显示了代码以及我在这里所做的,没有对正在上传的文件进行额外的预处理或其他,所以我不确定到底是什么问题...如果可以请帮忙!
我复制了你的代码和 运行 针对 CKAN 的本地开发副本的修改版本,并且在我的 mod 之后正常工作,包括在下面。
最值得注意的是:
- payload_r -> 所有这些额外的东西都不需要,但如果需要,您可以包括其他资源元数据,例如描述、名称等
- req_r -> 1) 在此处将有效负载作为
data
传递,而不是 json
,因为它是 multipart-form-data
。 2) 在此处发送带有 files
参数的文件。
文档:https://docs.ckan.org/en/2.9/maintaining/filestore.html#filestore-api
IMO 这与其说是 CKAN 问题,不如说是对所选库(即请求)的理解。使用不同的工具有很多方法可以做到这一点。
我还必须更新有效负载以与我的架构保持一致,但假设这对您来说是正确的,这应该可行。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01",
"private": "true",
"state": "active",
"owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
"maintainer" : "Forms Management",
"maintainer_email" : "forms.management@province.ca",
"author" : "Test Author",
"author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r, data=payload_r, headers=headers, files=[('upload', file('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf'))]) # resource_create()
我用于创建数据集并向该数据集添加单个 PDF 文件作为资源的简单 python 脚本失败并显示“{file} 不是 json 可序列化”。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01",
"private": "true",
"state": "active",
"owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
"maintainer" : "Forms Management",
"maintainer_email" : "forms.management@province.ca",
"author" : "Test Author",
"author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null",
"name": "English - test01 - Test Description",
"url": "upload",
"upload": open('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf', 'r'),
"description": "This is a test resource attached to dataset test01",
"notes": "This is a longer block of text that is for the resource test01e which is attached to the dataset test01"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r, json = payload_r, headers = headers) # resource_create()
这会引发错误“{file} 不是 json 可序列化”。该文件是一个 PDF,它是一个二进制文件,但我不确定是否需要某种类型的编码(请注意注释掉的“base64”模块......我不想不问是否就走那条路这是正确的方法。)
CKAN API 文档在这里: https://docs.ckan.org/en/2.9/api/#ckan.logic.action.create.resource_create
说“上传”应该是“(FieldStorage (optional) needs multipart/form-data) – (optional)”但是我看到的所有将文件上传到 CKAN 的示例脚本只显示了代码以及我在这里所做的,没有对正在上传的文件进行额外的预处理或其他,所以我不确定到底是什么问题...如果可以请帮忙!
我复制了你的代码和 运行 针对 CKAN 的本地开发副本的修改版本,并且在我的 mod 之后正常工作,包括在下面。
最值得注意的是:
- payload_r -> 所有这些额外的东西都不需要,但如果需要,您可以包括其他资源元数据,例如描述、名称等
- req_r -> 1) 在此处将有效负载作为
data
传递,而不是json
,因为它是multipart-form-data
。 2) 在此处发送带有files
参数的文件。
文档:https://docs.ckan.org/en/2.9/maintaining/filestore.html#filestore-api
IMO 这与其说是 CKAN 问题,不如说是对所选库(即请求)的理解。使用不同的工具有很多方法可以做到这一点。
我还必须更新有效负载以与我的架构保持一致,但假设这对您来说是正确的,这应该可行。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01",
"private": "true",
"state": "active",
"owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d",
"maintainer" : "Forms Management",
"maintainer_email" : "forms.management@province.ca",
"author" : "Test Author",
"author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p, json=payload_p, headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r, data=payload_r, headers=headers, files=[('upload', file('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf'))]) # resource_create()