将文件从 PHP 服务器传递到 Python 服务器(HTTP 请求)

Pass a file from a PHP server to a Python server (HTTP request)

我在 Laravel PHP 服务器上有一个 Web 应用程序 运行。对于某些需求(Word 文档处理),我实现了一个 Python 服务器来进行数据提取。我想知道如何通过向 PHP 传递文件来调用我的 Python 服务器。 目前,我将 docx 文件保存在 PHP 服务器上,可通过 url 访问。我使用 URL 从 PHP 服务器向 Python 服务器发出 http POST 请求以下载文档。问题是我遇到了死锁,因为 PHP 服务器正在等待 Python 服务器的响应,而 Python 服务器正在等待 PHP 服务器下载文档。关于如何解决这个问题有什么建议吗?

这里是 PHP 代码:

// Send POST REQUEST
  $context_options = array(
    'http' => array(
      'method' => 'POST',
      'header' => "Content-type: application/x-www-form-urlencoded\r\n"
        . "Content-Length: " . strlen($data) . "\r\n",
      'content' => $data,
      'timeout' => 10,
    )
  );

  $context = stream_context_create($context_options);
  $result = fopen('http://localhost:5000/api/extraction','r', false, $context);

这里是 Python 代码:

@app.route('/api/extraction', methods=['post'])
def extraction():
    data = request.form.to_dict()
    url = data['file']  # get url
    filename = secure_filename(url.rsplit('/', 1)[-1])
    path = os.path.join(app.config['UPLOAD_FILE_FOLDER'], filename) 
    urllib.request.urlretrieve(url, path)

您应该通过适当的 POST (multipart/form) 请求发送文件,而不是让 Python 获取数据。它比您当前的 2 次往返方法更难调试和维护。

方法一:普通表单请求

<?php

/**
 * A genertor that yields multipart form-data fragments (without the ending EOL).
 * Would encode all files with base64 to make the request binary-safe.
 *
 * @param iterable $vars
 *    Key-value iterable (e.g. assoc array) of string or integer.
 *    Keys represents the field name.
 * @param iterable $files
 *    Key-value iterable (e.g. assoc array) of file path string.
 *    Keys represents the field name of file upload.
 *
 * @return \Generator
 *    Generator of multipart form-data fragments (without the ending EOL) in array format,
 *    always contains 2 values:
 *      0 - An array of header for a key-value pair.
 *      1 - A value string (can contain binary content) of the key-value pair.
 */
function generate_multipart_data_parts(iterable $vars, iterable $files=[]): Generator {
    // handle normal variables
    foreach ($vars as $name => $value) {
        $name = urlencode($name);
        $value = urlencode($value);
        yield [
            // header
            ["Content-Disposition: form-data; name=\"{$name}\""],
            // value
            $value,
        ];
    }

    // handle file contents
    foreach ($files as $file_fieldname => $file_path) {
        $file_fieldname = urlencode($file_fieldname);
        $file_data = file_get_contents($file_path);
        yield [
            // header
            [
                "Content-Disposition: form-data; name=\"{$file_fieldname}\"; filename=\"".basename($file_path)."\"",
                "Content-Type: application/octet-stream", // for binary safety
            ],
            // value
            $file_data
        ];
    }
}

/**
 * Converts output of generate_multipart_data_parts() into form data.
 *
 * @param iterable $parts
 *    An iterator of form fragment arrays. See return data of
 *    generate_multipart_data_parts().
 * @param string|null $boundary
 *    An optional pre-generated boundary string to use for wrapping data.
 *    Please reference section 7.2 "The Multipart Content-Type" in RFC1341.
 *
 * @return array
 *    An array with 2 items:
 *    0 - string boundary
 *    1 - string (can container binary data) data
 */
function wrap_multipart_data(iterable $parts, ?string $boundary = null): array {
    if (empty($boundary)) {
        $boundary = '-----------------------------------------boundary' . time();
    }
    $data = '';
    foreach ($parts as $part) {
        list($header, $content) = $part;
        // Check content for boundary.
        // Note: Won't check header and expect the program makes sense there.
        if (strstr($content, "\r\n$boundary") !== false) {
            throw new \Exception('Error: data contains the multipart boundary');
        }
        $data .= "--{$boundary}\r\n";
        $data .= implode("\r\n", $header) . "\r\n\r\n" . $content . "\r\n";
    }
    // signal end of request (note the trailing "--")
    $data .= "--{$boundary}--\r\n";
    return [$boundary, $data];
}

// build data for a multipart/form-data request
list($boundary, $data) = wrap_multipart_data(generate_multipart_data_parts(
    // normal form variables
    [
        'hello' => 'world',
        'foo' => 'bar',
    ],
    // files
    [
        'upload_file' => 'path/to/your/file.xlsx',
    ]
));

// Send POST REQUEST
$context_options = array(
    'http' => array(
        'method' => 'POST',
        'header' => "Content-type: multipart/form-data; boundary={$boundary}\r\n"
            . "Content-Length: " . strlen($data) . "\r\n",
        'content' => $data,
        'timeout' => 10,
    )
);

$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);

您的 Python 脚本应该接收作为普通 HTTP 表单文件上传的文件(文件字段名为“upload_file”)。使用您的框架支持的方法从请求中获取文件。

方法 2:非常长的 x-www-form-urlencoded 值

如果您担心二进制安全性,或者如果它以某种方式失败,另一种方法是将文件作为 base64 编码的字符串提交:

<?php

$file_data = file_get_contents('/some');
$data = urlencode([
  'upload_file' => base64_encode('path/to/your/file.xlsx'),
]);

// Send POST REQUEST
$context_options = array(
    'http' => array(
        'method' => 'POST',
        'header' => "Content-type: application/x-www-form-urlencoded\r\n"
            . "Content-Length: " . strlen($data) . "\r\n",
        'content' => $data,
        'timeout' => 10,
    )
);

$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);

您将在名为 "upload_file" 的字段上以 base64 编码的字符串形式在您的 Python 服务器上获取文件数据。您需要解码才能得到原始二进制内容。

方法 3:如果您坚持...

如果您坚持当前的 2 次往返方法,简单的解决方案是拥有 2 个不同的端点:

  • 一个用于向您的 Python 应用程序发送 POST 请求。
  • 一个用于提供 xlsx 文件,对 Python 应用程序没有任何要求。

根据您的描述,您的死锁就在那里,因为您为此目的使用了相同的脚本。我看不出为什么它们不能是 2 个独立的脚本/路由控制器的原因。