求最简单的定期更新CKAN链接资源的方法

Seeking the simplest method of updating CKAN linked resources regularly

我正在寻找更新链接资源时更新 CKAN 数据存储的最简单方法。在这种情况下,所有资源都是链接的(没有上传)。这些资源是 csv 格式的,并且会定期更新。当对 csv 文件进行更新时,更改似乎不会被 CKAN 的数据存储自动神奇地拾取。我尝试使用 ckanapi,但 update_resource 函数似乎只更新元数据。我无法让它持续更新 DataStore(因此 Data Explorer View 包含过时的信息)。

除非有更简单的方法,否则我的首选是找到一种方法来以编程方式触发 'Upload to DataStore' 按钮,该按钮可以在给定资源的“数据存储”选项卡上找到。我已经做了一些相当广泛的搜索,但还没有找到一种方法来做到这一点。任何建议表示赞赏。

CKAN 的当前版本是 CKAN 2.8.1,启用了 DataStore 和 DataPusher 扩展。

您应该能够通过使用 CKAN API 的脚本来执行此操作,特别是 datapusher_submit(见下文)。

这是我过去使用过的示例 python 脚本。

还有一个 PR open 可以帮助更好地记录这一点,但尚未合并。

#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint


# We'll use the package_search function to get all of the resources.
# NOTE: there may be a limit on this in the future and would have to then make multiple calls to collect
# all of the resources. Datasets has a hard-limit of 1000 but defaults to 10. So for now this works, but future issue maybe.
resources_request = urllib2.Request(
    'http://ckan-site.com/api/3/action/resource_search?query=name:')

# Make the HTTP request.
resources_response = urllib2.urlopen(resources_request)
# Make sure it worked
assert resources_response.code == 200

# Use the json module to load CKAN's response into a dictionary.
resources_response_dict = json.loads(resources_response.read())
assert resources_response_dict['success'] is True

results = resources_response_dict['result']['results']

for result in results:
  '''Loop over the resources and submit them to the datastore.
  '''
  try:
    request = urllib2.Request('http://ckan-site.com/api/3/action/datapusher_submit')

    data_dict = {
      "resource_id":result['id']
      }

    data_string = urllib.quote(json.dumps(data_dict))

    request.add_header('Authorization', 'your-token-here')

    response = urllib2.urlopen(request, data_string)
    assert json.loads(response.read())['success'] is True
  except:
    # Catch and print any issues and keep going.
    print "resource_id: "  + result['id']
    continue

print "Complete. Datastore is now up to date."