API 端点上莫名其妙的格式化魔法

Question

我正在为 Deutsche Bahn's Fahrplan OpenData API.

编写包装器

但是，我似乎无法产生与如下简单的 curl 请求相同的结果：

>>>import requests
>>>header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
>>>departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

# Now, using a journey's details id, lets request some journey details from the endpoint
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)
<Response [404]>
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header).request.url
'https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

好吧，到目前为止，太糟糕了。如您所见，我正在使用提供给我的数据。现在，通过网站调用端点，它告诉我它运行这个 curl 命令：

curl -X GET --header "Accept: application/json" --header "Authorization: Bearer 36e39957ace6f405a82cfb09522d0a8d" "https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160"

神奇的事情发生了：

原始旅程ID

'782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

变为：

'782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160'

和returns状态200。

似乎无处不在，旅程 ID 添加了一些字符。我将它复制并粘贴到给定的字段中，仅此而已，所以我知道这不是我。

我相信发生了某种编码/解码，但我以前从未见过，老实说，我不知道该怎么做。

如何在我的代码中处理这个问题？显然，除了简单地解析 departures 端点之外，我还需要做一些事情吗？或者，更好的是，我只是错过了一些明显的东西吗？

我已经向数据库开发人员发送了多封邮件，但到目前为止还没有收到他们的回复。

Answer 1

你看到的是双重URL编码。百分号 % 被 URL 编码为对应的序列 %25:

/ -> %2F -> %252F

在执行以下操作之前尝试 urldecode departure_data.json()[0]['detailsId']

>>> requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)

比如像这样

requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.unquote(urllib.unquote(departure_data.json()[0]['detailsId'])), headers=header)

Answer 2

在v1 of the API中定义了四个端点：

GET /location/{name}
GET /arrivalBoard/{id}
GET /departureBoard/{id}
GET /journeyDetails/{id}

他们每个人都需要一个 {id} 参数。你给这个参数的值必须是 URL 编码的，这是你忽略的事情。

/departureBoard/{id} 为您提供 Board 项的列表，其定义如下：

Board {
    name (string): ,
    type (string): ,
    boardId (string): ,
    stopId (string): ,
    stopName (string): ,
    dateTime (string): ,
    origin (string): ,
    track (string): ,
    detailsId (string):
}

detailsId 是您可以用来命中 /journeyDetails/{id} 端点的东西。所以最小的工作代码如下所示（注意对 urllib.parse.quote 的调用）：

import requests
import urllib

header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

journey_id = departure_data.json()[0]['detailsId']
journey_details = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.parse.quote(journey_id), headers=header)

journey_id 的值本身是 URL 编码并解码为看起来像 URL 片段的东西：

urllib.parse.unquote(journey_id)
# -> '564552/203236/867650/245641/80?station_evaId=8098160'

所以看起来有点像您可以简单地使用原始值来发出进一步的请求，但这是一种误解。

将 ID 视为需要编码的不透明纯文本值，就像在 URL.

中使用它之前对任何其他任意值进行编码一样

当您引用该值时，百分号被 %25 转义，这导致更长的值：

'564552%2F203236%2F867650%2F245641%2F80%3fstation_evaId%3D8098160'
'564552%252F203236%252F867650%252F245641%252F80%253fstation_evaId%253D8098160'

由于德国铁路 API 正在通过 Swagger, it might be easiest to install a swagger client let it create an API wrapper for you (see their swagger.json). pyswagger looks usable, but there are others 自行记录尝试。

通过这种方式，您可以专注于发出 API 请求和获取数据以及 URL 等低级管道 - 编码甚至授权将在后台透明地进行。

API 端点上莫名其妙的格式化魔法

Inexplicable formatting magic on API Endpoint

python

api

opendata

python-3.x

python-requests