求最简单的定期更新CKAN链接资源的方法
Seeking the simplest method of updating CKAN linked resources regularly
我正在寻找更新链接资源时更新 CKAN 数据存储的最简单方法。在这种情况下,所有资源都是链接的(没有上传)。这些资源是 csv 格式的,并且会定期更新。当对 csv 文件进行更新时,更改似乎不会被 CKAN 的数据存储自动神奇地拾取。我尝试使用 ckanapi,但 update_resource 函数似乎只更新元数据。我无法让它持续更新 DataStore(因此 Data Explorer View 包含过时的信息)。
除非有更简单的方法,否则我的首选是找到一种方法来以编程方式触发 'Upload to DataStore' 按钮,该按钮可以在给定资源的“数据存储”选项卡上找到。我已经做了一些相当广泛的搜索,但还没有找到一种方法来做到这一点。任何建议表示赞赏。
CKAN 的当前版本是 CKAN 2.8.1,启用了 DataStore 和 DataPusher 扩展。
您应该能够通过使用 CKAN API 的脚本来执行此操作,特别是 datapusher_submit
(见下文)。
这是我过去使用过的示例 python 脚本。
还有一个 PR open 可以帮助更好地记录这一点,但尚未合并。
#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint
# We'll use the package_search function to get all of the resources.
# NOTE: there may be a limit on this in the future and would have to then make multiple calls to collect
# all of the resources. Datasets has a hard-limit of 1000 but defaults to 10. So for now this works, but future issue maybe.
resources_request = urllib2.Request(
'http://ckan-site.com/api/3/action/resource_search?query=name:')
# Make the HTTP request.
resources_response = urllib2.urlopen(resources_request)
# Make sure it worked
assert resources_response.code == 200
# Use the json module to load CKAN's response into a dictionary.
resources_response_dict = json.loads(resources_response.read())
assert resources_response_dict['success'] is True
results = resources_response_dict['result']['results']
for result in results:
'''Loop over the resources and submit them to the datastore.
'''
try:
request = urllib2.Request('http://ckan-site.com/api/3/action/datapusher_submit')
data_dict = {
"resource_id":result['id']
}
data_string = urllib.quote(json.dumps(data_dict))
request.add_header('Authorization', 'your-token-here')
response = urllib2.urlopen(request, data_string)
assert json.loads(response.read())['success'] is True
except:
# Catch and print any issues and keep going.
print "resource_id: " + result['id']
continue
print "Complete. Datastore is now up to date."
我正在寻找更新链接资源时更新 CKAN 数据存储的最简单方法。在这种情况下,所有资源都是链接的(没有上传)。这些资源是 csv 格式的,并且会定期更新。当对 csv 文件进行更新时,更改似乎不会被 CKAN 的数据存储自动神奇地拾取。我尝试使用 ckanapi,但 update_resource 函数似乎只更新元数据。我无法让它持续更新 DataStore(因此 Data Explorer View 包含过时的信息)。
除非有更简单的方法,否则我的首选是找到一种方法来以编程方式触发 'Upload to DataStore' 按钮,该按钮可以在给定资源的“数据存储”选项卡上找到。我已经做了一些相当广泛的搜索,但还没有找到一种方法来做到这一点。任何建议表示赞赏。
CKAN 的当前版本是 CKAN 2.8.1,启用了 DataStore 和 DataPusher 扩展。
您应该能够通过使用 CKAN API 的脚本来执行此操作,特别是 datapusher_submit
(见下文)。
这是我过去使用过的示例 python 脚本。
还有一个 PR open 可以帮助更好地记录这一点,但尚未合并。
#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint
# We'll use the package_search function to get all of the resources.
# NOTE: there may be a limit on this in the future and would have to then make multiple calls to collect
# all of the resources. Datasets has a hard-limit of 1000 but defaults to 10. So for now this works, but future issue maybe.
resources_request = urllib2.Request(
'http://ckan-site.com/api/3/action/resource_search?query=name:')
# Make the HTTP request.
resources_response = urllib2.urlopen(resources_request)
# Make sure it worked
assert resources_response.code == 200
# Use the json module to load CKAN's response into a dictionary.
resources_response_dict = json.loads(resources_response.read())
assert resources_response_dict['success'] is True
results = resources_response_dict['result']['results']
for result in results:
'''Loop over the resources and submit them to the datastore.
'''
try:
request = urllib2.Request('http://ckan-site.com/api/3/action/datapusher_submit')
data_dict = {
"resource_id":result['id']
}
data_string = urllib.quote(json.dumps(data_dict))
request.add_header('Authorization', 'your-token-here')
response = urllib2.urlopen(request, data_string)
assert json.loads(response.read())['success'] is True
except:
# Catch and print any issues and keep going.
print "resource_id: " + result['id']
continue
print "Complete. Datastore is now up to date."