使用 FastAPI,如何根据 OpenAPI (Swagger) 文档上的请求 header 添加字符集到 content-type (media-type)?
With FastAPI, How to add charset to content-type (media-type) on request header on OpenAPI (Swagger) doc?
使用 FastAPI,如何根据 OpenAPI (Swagger) 文档上的请求 header 添加字符集到 content-type (media-type)?
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
OpenAPI (http:///docs) 显示“application/x-www-form-urlencoded”。
我试着改成这样:
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
return {"Hello": "World!", "userName": username}
但不能添加charset=cp932
我想根据要求将“application/x-www-form-urlencoded; charset=cp932”设置为 Content-Type。
我想得到 username 由字符集解码。
使用 FastAPI,如何在自动生成的 OpenAPI (Swagger) 文档中将 charset
添加到 Content-Type 请求 header?
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
通过上述路线,在 http://<root_path>/docs
处生成的 OpenAPI 文档显示 application/x-www-form-urlencoded
.
我试过这个:
@app.post("/")
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
return {"Hello": "World!", "userName": username}
但文档仍然只显示 application/x-www-form-urlencoded
。
我想设置application/x-www-form-urlencoded; charset=cp932
为值
Content-Type
在此 endpoint/path 函数的响应中。我想要
使用该编码方案接收要解码的表单数据。
简答
在一般情况下这似乎不是一个好主意;我不认为有
一种简单的、built-in方法;而且可能没有必要。
不是个好主意 (a.k.a。不是 standards-compliant)
这个 GitHub 问题讨论了为什么将 ;charset=UTF-8
附加到
application/json
不是一个好主意, 和相同点
在那里长大适用于这种情况。
The HTTP/1.1 specification says that the Content-Type
header lists the Media Type.
注:HTTP/2 shares these components with HTTP/1.1
The IANA manages the registry of all the commonly used Media Types (MIME)..
The entry for application/x-www-form-urlencoded
says:
Media type name: application
Media subtype name: x-www-form-urlencoded
Required parameters: No parameters
Optional parameters:
No parameters
Encoding considerations: 7bit
进行比较
MIME media type name : Text
MIME subtype name : Standards Tree - html
Required parameters : No required parameters
Optional parameters :
charset
The charset parameter may be provided to definitively specify the document's character encoding, overriding any character encoding declarations in the document. The parameter's value must be one of the labels of the character encoding used to serialize the file.
Encoding considerations : 8bit
application/x-www-form-urlencoded
的条目不允许添加 charset
。那么应该如何从字节解码呢? The URL spec states:
- Let nameString and valueString be the result of running UTF-8 decode
without BOM on the percent-decoding of name and
value, respectively.
听起来,无论编码是什么,都应该始终使用 UTF-8
解码。
当前的HTML/URL规范也有关于
application/x-www-form-urlencoded
:
The application/x-www-form-urlencoded
format is in many ways an aberrant
monstrosity, the result of many years of implementation accidents and
compromises leading to a set of requirements necessary for interoperability,
but in no way representing good design practices. In particular, readers are
cautioned to pay close attention to the twisted details involving repeated
(and in some cases nested) conversions between character encodings and byte
sequences. Unfortunately the format is in widespread use due to the
prevalence of HTML forms.
所以听起来做一些不同的事情并不是一个好主意。
没有built-in方式
注意:built-in 执行这些解决方案的方法是使用 a custom Request
class.
构建 /openapi.json
object 时,FastAPI 的当前版本
检查依赖项是否是 Form
的实例,然后使用一个空的
Form
构建架构的实例,即使实际依赖项是
Form
的子class。
Form.__init__
的media_type
参数默认值为
application/x-www-form-urlencoded
, 所以每
具有 Form()
依赖项的 endpoint/path 函数将显示相同的媒体类型
在文档中,即使 class __init__(
有一个 media_type
参数。
有几种方法可以更改 /openapi.json
中列出的内容,即
什么用于生成文档,FastAPI 文档列出了一个
官方方式.
对于问题中的示例,这将起作用:
from fastapi import FastAPI, Form
from fastapi.openapi.utils import get_openapi
app = FastAPI()
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
值得注意的是,通过此更改,文档用户界面更改了它呈现实验部分的方式:
与没有 charset
的 application/x-www-form-urlencoded
相比
指定显示:
以上更改只会更改文档中列出的媒体类型。任何形式
发送到 endpoint/path 函数的数据仍然是:
所以即使 starlette
被更改为使用不同的编码方案
解码表单数据,python-multipart
仍然遵循
为 &
和 ;
使用硬编码字节值的规范,对于
例如。
幸运的是,most* of the first 128 characters/codepoints are mapped to the same byte sequences between cp932 and UTF-8,所以&
,
;
和 =
结果都一样。
*除了0x5C
,有时是¥
将 starlette
更改为使用 cp932 编码的一种方法是使用 middleware:
import typing
from unittest.mock import patch
from urllib.parse import unquote_plus
import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser
app = FastAPI()
form_path = "/"
@app.post(form_path)
async def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
class CP932FormParser(FormParser):
async def parse(self) -> FormData:
"""
copied from:
https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
"""
# Callbacks dictionary.
callbacks = {
"on_field_start": self.on_field_start,
"on_field_name": self.on_field_name,
"on_field_data": self.on_field_data,
"on_field_end": self.on_field_end,
"on_end": self.on_end,
}
# Create the parser.
parser = multipart.QuerystringParser(callbacks)
field_name = b""
field_value = b""
items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []
# Feed the parser with data from the request.
async for chunk in self.stream:
if chunk:
parser.write(chunk)
else:
parser.finalize()
messages = list(self.messages)
self.messages.clear()
for message_type, message_bytes in messages:
if message_type == FormMessage.FIELD_START:
field_name = b""
field_value = b""
elif message_type == FormMessage.FIELD_NAME:
field_name += message_bytes
elif message_type == FormMessage.FIELD_DATA:
field_value += message_bytes
elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode("cp932")) # changed line
value = unquote_plus(field_value.decode("cp932")) # changed line
items.append((name, value))
return FormData(items)
class CustomRequest(Request):
async def form(self) -> FormData:
"""
copied from
https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
"""
if not hasattr(self, "_form"):
assert (
parse_options_header is not None
), "The `python-multipart` library must be installed to use form parsing."
content_type_header = self.headers.get("Content-Type")
content_type, options = parse_options_header(content_type_header)
if content_type == b"multipart/form-data":
multipart_parser = MultiPartParser(self.headers, self.stream())
self._form = await multipart_parser.parse()
elif content_type == b"application/x-www-form-urlencoded":
form_parser = CP932FormParser(
self.headers, self.stream()
) # use the custom parser above
self._form = await form_parser.parse()
else:
self._form = FormData()
return self._form
@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
if request.scope["path"] == form_path:
# starlette creates a new Request object for each middleware/app
# invocation:
# https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
# this temporarily patches the Request object starlette
# uses with our modified version
with patch("starlette.routing.Request", new=CustomRequest):
return await call_next(request)
然后,数据必须手动编码:
>>> import sys
>>> from urllib.parse import quote_plus
>>> name = quote_plus("username").encode("cp932")
>>> value = quote_plus("cp932文字コード").encode("cp932")
>>> with open("temp.txt", "wb") as file:
... file.write(name + b"=" + value)
...
59
并作为二进制数据发送:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
--data-binary "@temp.txt" \
--silent \
| jq -C .
{
"Hello": "cp932文字コード"
}
可能没有必要
在手动编码步骤中,输出将如下所示:
username=cp932%E6%96%87%E5%AD%97%E3%82%B3%E3%83%BC%E3%83%89
percent-encoding 步骤的一部分替换了表示
高于 0x7E
的字符(ASCII 中的 ~
),字节在简化的 ASCII 中
范围。由于 cp932 和 UTF-8 都将这些字节映射到相同的代码点
(除了 0x5C
可能是 \
或 ¥
),字节序列将解码为
相同的字符串:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
--data-urlencode "username=cp932文字コード" \
--silent \
| jq -C .
{
"Hello": "cp932文字コード"
}
这仅适用于 percent-encoded 数据。
没有percent-encoding发送的任何数据都将被处理和解释
不同于发件人希望它被解释的方式。例如,在
OpenAPI (Swagger) 文档,“尝试一下”实验部分给出了一个
例如 curl -d
(same as --data
), which doesn't urlencode the data:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded' \
--data "username=cp932文字コード" \
--silent \
| jq -C .
{
"Hello": "cp932æ–‡å—コード"
}
仅使用 cp932 来处理来自以类似于服务器的方式配置的发件人的请求可能仍然是个好主意。
一种方法是将中间件函数修改为只处理
像这样的数据,如果发送方指定数据已经用
cp932:
import typing
from unittest.mock import patch
from urllib.parse import unquote_plus
import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser
app = FastAPI()
form_path = "/"
@app.post(form_path)
async def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
class CP932FormParser(FormParser):
async def parse(self) -> FormData:
"""
copied from:
https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
"""
# Callbacks dictionary.
callbacks = {
"on_field_start": self.on_field_start,
"on_field_name": self.on_field_name,
"on_field_data": self.on_field_data,
"on_field_end": self.on_field_end,
"on_end": self.on_end,
}
# Create the parser.
parser = multipart.QuerystringParser(callbacks)
field_name = b""
field_value = b""
items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []
# Feed the parser with data from the request.
async for chunk in self.stream:
if chunk:
parser.write(chunk)
else:
parser.finalize()
messages = list(self.messages)
self.messages.clear()
for message_type, message_bytes in messages:
if message_type == FormMessage.FIELD_START:
field_name = b""
field_value = b""
elif message_type == FormMessage.FIELD_NAME:
field_name += message_bytes
elif message_type == FormMessage.FIELD_DATA:
field_value += message_bytes
elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode("cp932")) # changed line
value = unquote_plus(field_value.decode("cp932")) # changed line
items.append((name, value))
return FormData(items)
class CustomRequest(Request):
async def form(self) -> FormData:
"""
copied from
https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
"""
if not hasattr(self, "_form"):
assert (
parse_options_header is not None
), "The `python-multipart` library must be installed to use form parsing."
content_type_header = self.headers.get("Content-Type")
content_type, options = parse_options_header(content_type_header)
if content_type == b"multipart/form-data":
multipart_parser = MultiPartParser(self.headers, self.stream())
self._form = await multipart_parser.parse()
elif content_type == b"application/x-www-form-urlencoded":
form_parser = CP932FormParser(
self.headers, self.stream()
) # use the custom parser above
self._form = await form_parser.parse()
else:
self._form = FormData()
return self._form
@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
if request.scope["path"] != form_path:
return await call_next(request)
content_type_header = request.headers.get("content-type", None)
if not content_type_header:
return await call_next(request)
media_type, options = parse_options_header(content_type_header)
if b"charset" not in options or options[b"charset"] != b"cp932":
return await call_next(request)
# starlette creates a new Request object for each middleware/app
# invocation:
# https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
# this temporarily patches the Request object starlette
# uses with our modified version
with patch("starlette.routing.Request", new=CustomRequest):
return await call_next(request)
安全
即使有这个修改,我认为规范中关于解析内容的说明
percent-decode 也应突出显示:
⚠ Warning! Using anything but UTF-8 decode without BOM when input contains bytes that
are not ASCII bytes might be insecure and is not recommended.
所以我对实施任何这些解决方案都持谨慎态度。
使用 FastAPI,如何根据 OpenAPI (Swagger) 文档上的请求 header 添加字符集到 content-type (media-type)?
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
OpenAPI (http:///docs) 显示“application/x-www-form-urlencoded”。
我试着改成这样:
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
return {"Hello": "World!", "userName": username}
但不能添加charset=cp932
我想根据要求将“application/x-www-form-urlencoded; charset=cp932”设置为 Content-Type。 我想得到 username 由字符集解码。
使用 FastAPI,如何在自动生成的 OpenAPI (Swagger) 文档中将 charset
添加到 Content-Type 请求 header?
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
通过上述路线,在 http://<root_path>/docs
处生成的 OpenAPI 文档显示 application/x-www-form-urlencoded
.
我试过这个:
@app.post("/")
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
return {"Hello": "World!", "userName": username}
但文档仍然只显示 application/x-www-form-urlencoded
。
我想设置application/x-www-form-urlencoded; charset=cp932
为值
Content-Type
在此 endpoint/path 函数的响应中。我想要
使用该编码方案接收要解码的表单数据。
简答
在一般情况下这似乎不是一个好主意;我不认为有
一种简单的、built-in方法;而且可能没有必要。
不是个好主意 (a.k.a。不是 standards-compliant)
这个 GitHub 问题讨论了为什么将 ;charset=UTF-8
附加到
application/json
不是一个好主意, 和相同点
在那里长大适用于这种情况。
The HTTP/1.1 specification says that the Content-Type
header lists the Media Type.
注:HTTP/2 shares these components with HTTP/1.1
The IANA manages the registry of all the commonly used Media Types (MIME)..
The entry for application/x-www-form-urlencoded
says:
Media type name: application
Media subtype name: x-www-form-urlencoded
Required parameters: No parameters
Optional parameters:
No parameters
Encoding considerations: 7bit
进行比较
MIME media type name : Text
MIME subtype name : Standards Tree - html
Required parameters : No required parameters
Optional parameters :
charset
The charset parameter may be provided to definitively specify the document's character encoding, overriding any character encoding declarations in the document. The parameter's value must be one of the labels of the character encoding used to serialize the file.
Encoding considerations : 8bit
application/x-www-form-urlencoded
的条目不允许添加 charset
。那么应该如何从字节解码呢? The URL spec states:
- Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value, respectively.
听起来,无论编码是什么,都应该始终使用 UTF-8 解码。
当前的HTML/URL规范也有关于
application/x-www-form-urlencoded
:
The
application/x-www-form-urlencoded
format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.
所以听起来做一些不同的事情并不是一个好主意。
没有built-in方式
注意:built-in 执行这些解决方案的方法是使用 a custom Request
class.
构建 /openapi.json
object 时,FastAPI 的当前版本
检查依赖项是否是 Form
的实例,然后使用一个空的
Form
构建架构的实例,即使实际依赖项是
Form
的子class。
Form.__init__
的media_type
参数默认值为
application/x-www-form-urlencoded
, 所以每
具有 Form()
依赖项的 endpoint/path 函数将显示相同的媒体类型
在文档中,即使 class __init__(
有一个 media_type
参数。
有几种方法可以更改 /openapi.json
中列出的内容,即
什么用于生成文档,FastAPI 文档列出了一个
官方方式.
对于问题中的示例,这将起作用:
from fastapi import FastAPI, Form
from fastapi.openapi.utils import get_openapi
app = FastAPI()
@app.post("/")
def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
值得注意的是,通过此更改,文档用户界面更改了它呈现实验部分的方式:
与没有 charset
的 application/x-www-form-urlencoded
相比
指定显示:
以上更改只会更改文档中列出的媒体类型。任何形式 发送到 endpoint/path 函数的数据仍然是:
所以即使 starlette
被更改为使用不同的编码方案
解码表单数据,python-multipart
仍然遵循
为 &
和 ;
使用硬编码字节值的规范,对于
例如。
幸运的是,most* of the first 128 characters/codepoints are mapped to the same byte sequences between cp932 and UTF-8,所以&
,
;
和 =
结果都一样。
*除了0x5C
,有时是¥
将 starlette
更改为使用 cp932 编码的一种方法是使用 middleware:
import typing
from unittest.mock import patch
from urllib.parse import unquote_plus
import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser
app = FastAPI()
form_path = "/"
@app.post(form_path)
async def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
class CP932FormParser(FormParser):
async def parse(self) -> FormData:
"""
copied from:
https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
"""
# Callbacks dictionary.
callbacks = {
"on_field_start": self.on_field_start,
"on_field_name": self.on_field_name,
"on_field_data": self.on_field_data,
"on_field_end": self.on_field_end,
"on_end": self.on_end,
}
# Create the parser.
parser = multipart.QuerystringParser(callbacks)
field_name = b""
field_value = b""
items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []
# Feed the parser with data from the request.
async for chunk in self.stream:
if chunk:
parser.write(chunk)
else:
parser.finalize()
messages = list(self.messages)
self.messages.clear()
for message_type, message_bytes in messages:
if message_type == FormMessage.FIELD_START:
field_name = b""
field_value = b""
elif message_type == FormMessage.FIELD_NAME:
field_name += message_bytes
elif message_type == FormMessage.FIELD_DATA:
field_value += message_bytes
elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode("cp932")) # changed line
value = unquote_plus(field_value.decode("cp932")) # changed line
items.append((name, value))
return FormData(items)
class CustomRequest(Request):
async def form(self) -> FormData:
"""
copied from
https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
"""
if not hasattr(self, "_form"):
assert (
parse_options_header is not None
), "The `python-multipart` library must be installed to use form parsing."
content_type_header = self.headers.get("Content-Type")
content_type, options = parse_options_header(content_type_header)
if content_type == b"multipart/form-data":
multipart_parser = MultiPartParser(self.headers, self.stream())
self._form = await multipart_parser.parse()
elif content_type == b"application/x-www-form-urlencoded":
form_parser = CP932FormParser(
self.headers, self.stream()
) # use the custom parser above
self._form = await form_parser.parse()
else:
self._form = FormData()
return self._form
@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
if request.scope["path"] == form_path:
# starlette creates a new Request object for each middleware/app
# invocation:
# https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
# this temporarily patches the Request object starlette
# uses with our modified version
with patch("starlette.routing.Request", new=CustomRequest):
return await call_next(request)
然后,数据必须手动编码:
>>> import sys
>>> from urllib.parse import quote_plus
>>> name = quote_plus("username").encode("cp932")
>>> value = quote_plus("cp932文字コード").encode("cp932")
>>> with open("temp.txt", "wb") as file:
... file.write(name + b"=" + value)
...
59
并作为二进制数据发送:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
--data-binary "@temp.txt" \
--silent \
| jq -C .
{
"Hello": "cp932文字コード"
}
可能没有必要
在手动编码步骤中,输出将如下所示:
username=cp932%E6%96%87%E5%AD%97%E3%82%B3%E3%83%BC%E3%83%89
percent-encoding 步骤的一部分替换了表示
高于 0x7E
的字符(ASCII 中的 ~
),字节在简化的 ASCII 中
范围。由于 cp932 和 UTF-8 都将这些字节映射到相同的代码点
(除了 0x5C
可能是 \
或 ¥
),字节序列将解码为
相同的字符串:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
--data-urlencode "username=cp932文字コード" \
--silent \
| jq -C .
{
"Hello": "cp932文字コード"
}
这仅适用于 percent-encoded 数据。
没有percent-encoding发送的任何数据都将被处理和解释
不同于发件人希望它被解释的方式。例如,在
OpenAPI (Swagger) 文档,“尝试一下”实验部分给出了一个
例如 curl -d
(same as --data
), which doesn't urlencode the data:
$ curl -X 'POST' \
'http://localhost:8000/' \
-H 'accept: application/json' \
-H 'Content-Type: application/x-www-form-urlencoded' \
--data "username=cp932文字コード" \
--silent \
| jq -C .
{
"Hello": "cp932æ–‡å—コード"
}
仅使用 cp932 来处理来自以类似于服务器的方式配置的发件人的请求可能仍然是个好主意。
一种方法是将中间件函数修改为只处理 像这样的数据,如果发送方指定数据已经用 cp932:
import typing
from unittest.mock import patch
from urllib.parse import unquote_plus
import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser
app = FastAPI()
form_path = "/"
@app.post(form_path)
async def post_hello(username: str = Form(...)):
return {"Hello": username}
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
app.openapi_schema = get_openapi(
title=app.title,
version=app.version,
openapi_version=app.openapi_version,
description=app.description,
terms_of_service=app.terms_of_service,
contact=app.contact,
license_info=app.license_info,
routes=app.routes,
tags=app.openapi_tags,
servers=app.servers,
)
requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
content = requestBody["content"]
new_content = {
"application/x-www-form-urlencoded;charset=cp932": content[
"application/x-www-form-urlencoded"
]
}
requestBody["content"] = new_content
return app.openapi_schema
app.openapi = custom_openapi
class CP932FormParser(FormParser):
async def parse(self) -> FormData:
"""
copied from:
https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
"""
# Callbacks dictionary.
callbacks = {
"on_field_start": self.on_field_start,
"on_field_name": self.on_field_name,
"on_field_data": self.on_field_data,
"on_field_end": self.on_field_end,
"on_end": self.on_end,
}
# Create the parser.
parser = multipart.QuerystringParser(callbacks)
field_name = b""
field_value = b""
items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []
# Feed the parser with data from the request.
async for chunk in self.stream:
if chunk:
parser.write(chunk)
else:
parser.finalize()
messages = list(self.messages)
self.messages.clear()
for message_type, message_bytes in messages:
if message_type == FormMessage.FIELD_START:
field_name = b""
field_value = b""
elif message_type == FormMessage.FIELD_NAME:
field_name += message_bytes
elif message_type == FormMessage.FIELD_DATA:
field_value += message_bytes
elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode("cp932")) # changed line
value = unquote_plus(field_value.decode("cp932")) # changed line
items.append((name, value))
return FormData(items)
class CustomRequest(Request):
async def form(self) -> FormData:
"""
copied from
https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
"""
if not hasattr(self, "_form"):
assert (
parse_options_header is not None
), "The `python-multipart` library must be installed to use form parsing."
content_type_header = self.headers.get("Content-Type")
content_type, options = parse_options_header(content_type_header)
if content_type == b"multipart/form-data":
multipart_parser = MultiPartParser(self.headers, self.stream())
self._form = await multipart_parser.parse()
elif content_type == b"application/x-www-form-urlencoded":
form_parser = CP932FormParser(
self.headers, self.stream()
) # use the custom parser above
self._form = await form_parser.parse()
else:
self._form = FormData()
return self._form
@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
if request.scope["path"] != form_path:
return await call_next(request)
content_type_header = request.headers.get("content-type", None)
if not content_type_header:
return await call_next(request)
media_type, options = parse_options_header(content_type_header)
if b"charset" not in options or options[b"charset"] != b"cp932":
return await call_next(request)
# starlette creates a new Request object for each middleware/app
# invocation:
# https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
# this temporarily patches the Request object starlette
# uses with our modified version
with patch("starlette.routing.Request", new=CustomRequest):
return await call_next(request)
安全
即使有这个修改,我认为规范中关于解析内容的说明 percent-decode 也应突出显示:
⚠ Warning! Using anything but UTF-8 decode without BOM when input contains bytes that are not ASCII bytes might be insecure and is not recommended.
所以我对实施任何这些解决方案都持谨慎态度。