使用 FastAPI，如何根据 OpenAPI (Swagger) 文档上的请求 header 添加字符集到 content-type (media-type)？

Question

@app.post("/")
def post_hello(username: str = Form(...)):
   return {"Hello": username}

OpenAPI (http:///docs) 显示“application/x-www-form-urlencoded”。

我试着改成这样：

def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
   return {"Hello": "World!", "userName": username}

但不能添加charset=cp932

我想根据要求将“application/x-www-form-urlencoded; charset=cp932”设置为 Content-Type。我想得到 username 由字符集解码。

Answer 1

使用 FastAPI，如何在自动生成的 OpenAPI (Swagger) 文档中将 charset 添加到 Content-Type 请求 header？

@app.post("/")
def post_hello(username: str = Form(...)):
   return {"Hello": username}

通过上述路线，在 http://<root_path>/docs 处生成的 OpenAPI 文档显示 application/x-www-form-urlencoded.

我试过这个：

@app.post("/")
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
   return {"Hello": "World!", "userName": username}

但文档仍然只显示 application/x-www-form-urlencoded。

我想设置application/x-www-form-urlencoded; charset=cp932为值 Content-Type 在此 endpoint/path 函数的响应中。我想要使用该编码方案接收要解码的表单数据。

简答

在一般情况下这似乎不是一个好主意；我不认为有一种简单的~~、built-in~~方法；而且可能没有必要。

不是个好主意 (a.k.a。不是 standards-compliant)

这个 GitHub 问题讨论了为什么将 ;charset=UTF-8 附加到 application/json 不是一个好主意，和相同点在那里长大适用于这种情况。

The HTTP/1.1 specification says that the Content-Type header lists the Media Type.

注：HTTP/2 shares these components with HTTP/1.1

The IANA manages the registry of all the commonly used Media Types (MIME)..

The entry for application/x-www-form-urlencoded says:

Media type name: application
Media subtype name: x-www-form-urlencoded

Required parameters: No parameters

Optional parameters:
No parameters

Encoding considerations: 7bit

将此与 the entry for text/html:

进行比较

MIME media type name : Text

MIME subtype name : Standards Tree - html

Required parameters : No required parameters

Optional parameters :
charset
The charset parameter may be provided to definitively specify the document's character encoding, overriding any character encoding declarations in the document. The parameter's value must be one of the labels of the character encoding used to serialize the file.

Encoding considerations : 8bit

application/x-www-form-urlencoded 的条目不允许添加 charset。那么应该如何从字节解码呢？ The URL spec states:

Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value, respectively.

听起来，无论编码是什么，都应该始终使用 UTF-8 解码。

当前的HTML/URL规范也有关于 application/x-www-form-urlencoded:

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

所以听起来做一些不同的事情并不是一个好主意。

没有built-in方式

注意：built-in 执行这些解决方案的方法是使用 a custom Request class.

构建 /openapi.json object 时，FastAPI 的当前版本检查依赖项是否是 Form 的实例，然后使用一个空的 Form 构建架构的实例，即使实际依赖项是 Form 的子class。

Form.__init__的media_type参数默认值为 application/x-www-form-urlencoded, 所以每具有 Form() 依赖项的 endpoint/path 函数将显示相同的媒体类型在文档中，即使 class __init__( 有一个 media_type 参数。

有几种方法可以更改 /openapi.json 中列出的内容，即什么用于生成文档，FastAPI 文档列出了一个官方方式.

对于问题中的示例，这将起作用：

from fastapi import FastAPI, Form
from fastapi.openapi.utils import get_openapi

app = FastAPI()


@app.post("/")
def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi

值得注意的是，通过此更改，文档用户界面更改了它呈现实验部分的方式：

与没有 charset 的 application/x-www-form-urlencoded 相比指定显示：

以上更改只会更改文档中列出的媒体类型。任何形式发送到 endpoint/path 函数的数据仍然是：

parsed by python-multipart（大致遵循规范中描述的相同步骤)
decoded by starlette with Latin-1

所以即使 starlette 被更改为使用不同的编码方案解码表单数据，python-multipart 仍然遵循为 & 和 ; 使用硬编码字节值的规范，对于例如。

幸运的是，most* of the first 128 characters/codepoints are mapped to the same byte sequences between cp932 and UTF-8,所以&， ; 和 = 结果都一样。

*除了0x5C，有时是¥

将 starlette 更改为使用 cp932 编码的一种方法是使用 middleware:

import typing
from unittest.mock import patch
from urllib.parse import unquote_plus

import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser

app = FastAPI()

form_path = "/"


@app.post(form_path)
async def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi


class CP932FormParser(FormParser):
    async def parse(self) -> FormData:
        """
        copied from:
        https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
        """
        # Callbacks dictionary.
        callbacks = {
            "on_field_start": self.on_field_start,
            "on_field_name": self.on_field_name,
            "on_field_data": self.on_field_data,
            "on_field_end": self.on_field_end,
            "on_end": self.on_end,
        }

        # Create the parser.
        parser = multipart.QuerystringParser(callbacks)
        field_name = b""
        field_value = b""

        items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []

        # Feed the parser with data from the request.
        async for chunk in self.stream:
            if chunk:
                parser.write(chunk)
            else:
                parser.finalize()
            messages = list(self.messages)
            self.messages.clear()
            for message_type, message_bytes in messages:
                if message_type == FormMessage.FIELD_START:
                    field_name = b""
                    field_value = b""
                elif message_type == FormMessage.FIELD_NAME:
                    field_name += message_bytes
                elif message_type == FormMessage.FIELD_DATA:
                    field_value += message_bytes
                elif message_type == FormMessage.FIELD_END:
                    name = unquote_plus(field_name.decode("cp932"))  # changed line
                    value = unquote_plus(field_value.decode("cp932"))  # changed line
                    items.append((name, value))

        return FormData(items)


class CustomRequest(Request):
    async def form(self) -> FormData:
        """
        copied from
        https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
        """
        if not hasattr(self, "_form"):
            assert (
                parse_options_header is not None
            ), "The `python-multipart` library must be installed to use form parsing."
            content_type_header = self.headers.get("Content-Type")
            content_type, options = parse_options_header(content_type_header)
            if content_type == b"multipart/form-data":
                multipart_parser = MultiPartParser(self.headers, self.stream())
                self._form = await multipart_parser.parse()
            elif content_type == b"application/x-www-form-urlencoded":
                form_parser = CP932FormParser(
                    self.headers, self.stream()
                )  # use the custom parser above
                self._form = await form_parser.parse()
            else:
                self._form = FormData()
        return self._form


@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
    if request.scope["path"] == form_path:
        # starlette creates a new Request object for each middleware/app
        # invocation:
        # https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
        # this temporarily patches the Request object starlette
        # uses with our modified version
        with patch("starlette.routing.Request", new=CustomRequest):
            return await call_next(request)

然后，数据必须手动编码：

>>> import sys
>>> from urllib.parse import quote_plus
>>> name = quote_plus("username").encode("cp932")
>>> value = quote_plus("cp932文字コード").encode("cp932")
>>> with open("temp.txt", "wb") as file:
...     file.write(name + b"=" + value)
...
59

并作为二进制数据发送：

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
  --data-binary "@temp.txt" \
  --silent \
| jq -C .

{
  "Hello": "cp932文字コード"
}

可能没有必要

在手动编码步骤中，输出将如下所示：

username=cp932%E6%96%87%E5%AD%97%E3%82%B3%E3%83%BC%E3%83%89

percent-encoding 步骤的一部分替换了表示高于 0x7E 的字符（ASCII 中的 ~），字节在简化的 ASCII 中范围。由于 cp932 和 UTF-8 都将这些字节映射到相同的代码点（除了 0x5C 可能是 \ 或 ¥），字节序列将解码为相同的字符串：

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
  --data-urlencode "username=cp932文字コード" \
  --silent \
| jq -C .

{
  "Hello": "cp932文字コード"
}

这仅适用于 percent-encoded 数据。

没有percent-encoding发送的任何数据都将被处理和解释不同于发件人希望它被解释的方式。例如，在 OpenAPI (Swagger) 文档，“尝试一下”实验部分给出了一个例如 curl -d (same as --data), which doesn't urlencode the data:

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data "username=cp932文字コード" \
  --silent \
| jq -C .
{
  "Hello": "cp932æ–‡å—ã‚³ãƒ¼ãƒ‰"
}

仅使用 cp932 来处理来自以类似于服务器的方式配置的发件人的请求可能仍然是个好主意。

一种方法是将中间件函数修改为只处理像这样的数据，如果发送方指定数据已经用 cp932:

import typing
from unittest.mock import patch
from urllib.parse import unquote_plus

import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser

app = FastAPI()

form_path = "/"


@app.post(form_path)
async def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi


class CP932FormParser(FormParser):
    async def parse(self) -> FormData:
        """
        copied from:
        https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
        """
        # Callbacks dictionary.
        callbacks = {
            "on_field_start": self.on_field_start,
            "on_field_name": self.on_field_name,
            "on_field_data": self.on_field_data,
            "on_field_end": self.on_field_end,
            "on_end": self.on_end,
        }

        # Create the parser.
        parser = multipart.QuerystringParser(callbacks)
        field_name = b""
        field_value = b""

        items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []

        # Feed the parser with data from the request.
        async for chunk in self.stream:
            if chunk:
                parser.write(chunk)
            else:
                parser.finalize()
            messages = list(self.messages)
            self.messages.clear()
            for message_type, message_bytes in messages:
                if message_type == FormMessage.FIELD_START:
                    field_name = b""
                    field_value = b""
                elif message_type == FormMessage.FIELD_NAME:
                    field_name += message_bytes
                elif message_type == FormMessage.FIELD_DATA:
                    field_value += message_bytes
                elif message_type == FormMessage.FIELD_END:
                    name = unquote_plus(field_name.decode("cp932"))  # changed line
                    value = unquote_plus(field_value.decode("cp932"))  # changed line
                    items.append((name, value))

        return FormData(items)


class CustomRequest(Request):
    async def form(self) -> FormData:
        """
        copied from
        https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
        """
        if not hasattr(self, "_form"):
            assert (
                parse_options_header is not None
            ), "The `python-multipart` library must be installed to use form parsing."
            content_type_header = self.headers.get("Content-Type")
            content_type, options = parse_options_header(content_type_header)
            if content_type == b"multipart/form-data":
                multipart_parser = MultiPartParser(self.headers, self.stream())
                self._form = await multipart_parser.parse()
            elif content_type == b"application/x-www-form-urlencoded":
                form_parser = CP932FormParser(
                    self.headers, self.stream()
                )  # use the custom parser above
                self._form = await form_parser.parse()
            else:
                self._form = FormData()
        return self._form


@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
    if request.scope["path"] != form_path:
        return await call_next(request)

    content_type_header = request.headers.get("content-type", None)
    if not content_type_header:
        return await call_next(request)

    media_type, options = parse_options_header(content_type_header)
    if b"charset" not in options or options[b"charset"] != b"cp932":
        return await call_next(request)

    # starlette creates a new Request object for each middleware/app
    # invocation:
    # https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
    # this temporarily patches the Request object starlette
    # uses with our modified version
    with patch("starlette.routing.Request", new=CustomRequest):
        return await call_next(request)

安全

即使有这个修改，我认为规范中关于解析内容的说明 percent-decode 也应突出显示：

⚠ Warning! Using anything but UTF-8 decode without BOM when input contains bytes that are not ASCII bytes might be insecure and is not recommended.

所以我对实施任何这些解决方案都持谨慎态度。

使用 FastAPI，如何根据 OpenAPI (Swagger) 文档上的请求 header 添加字符集到 content-type (media-type)？

With FastAPI, How to add charset to content-type (media-type) on request header on OpenAPI (Swagger) doc?

python-3.x

swagger

openapi

fastapi

简答

不是个好主意 (a.k.a。不是 standards-compliant)

没有built-in方式

可能没有必要

安全