将文件从 PHP 服务器传递到 Python 服务器(HTTP 请求)
Pass a file from a PHP server to a Python server (HTTP request)
我在 Laravel PHP 服务器上有一个 Web 应用程序 运行。对于某些需求(Word 文档处理),我实现了一个 Python 服务器来进行数据提取。我想知道如何通过向 PHP 传递文件来调用我的 Python 服务器。
目前,我将 docx 文件保存在 PHP 服务器上,可通过 url 访问。我使用 URL 从 PHP 服务器向 Python 服务器发出 http POST 请求以下载文档。问题是我遇到了死锁,因为 PHP 服务器正在等待 Python 服务器的响应,而 Python 服务器正在等待 PHP 服务器下载文档。关于如何解决这个问题有什么建议吗?
这里是 PHP 代码:
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
这里是 Python 代码:
@app.route('/api/extraction', methods=['post'])
def extraction():
data = request.form.to_dict()
url = data['file'] # get url
filename = secure_filename(url.rsplit('/', 1)[-1])
path = os.path.join(app.config['UPLOAD_FILE_FOLDER'], filename)
urllib.request.urlretrieve(url, path)
您应该通过适当的 POST (multipart/form) 请求发送文件,而不是让 Python 获取数据。它比您当前的 2 次往返方法更难调试和维护。
方法一:普通表单请求
<?php
/**
* A genertor that yields multipart form-data fragments (without the ending EOL).
* Would encode all files with base64 to make the request binary-safe.
*
* @param iterable $vars
* Key-value iterable (e.g. assoc array) of string or integer.
* Keys represents the field name.
* @param iterable $files
* Key-value iterable (e.g. assoc array) of file path string.
* Keys represents the field name of file upload.
*
* @return \Generator
* Generator of multipart form-data fragments (without the ending EOL) in array format,
* always contains 2 values:
* 0 - An array of header for a key-value pair.
* 1 - A value string (can contain binary content) of the key-value pair.
*/
function generate_multipart_data_parts(iterable $vars, iterable $files=[]): Generator {
// handle normal variables
foreach ($vars as $name => $value) {
$name = urlencode($name);
$value = urlencode($value);
yield [
// header
["Content-Disposition: form-data; name=\"{$name}\""],
// value
$value,
];
}
// handle file contents
foreach ($files as $file_fieldname => $file_path) {
$file_fieldname = urlencode($file_fieldname);
$file_data = file_get_contents($file_path);
yield [
// header
[
"Content-Disposition: form-data; name=\"{$file_fieldname}\"; filename=\"".basename($file_path)."\"",
"Content-Type: application/octet-stream", // for binary safety
],
// value
$file_data
];
}
}
/**
* Converts output of generate_multipart_data_parts() into form data.
*
* @param iterable $parts
* An iterator of form fragment arrays. See return data of
* generate_multipart_data_parts().
* @param string|null $boundary
* An optional pre-generated boundary string to use for wrapping data.
* Please reference section 7.2 "The Multipart Content-Type" in RFC1341.
*
* @return array
* An array with 2 items:
* 0 - string boundary
* 1 - string (can container binary data) data
*/
function wrap_multipart_data(iterable $parts, ?string $boundary = null): array {
if (empty($boundary)) {
$boundary = '-----------------------------------------boundary' . time();
}
$data = '';
foreach ($parts as $part) {
list($header, $content) = $part;
// Check content for boundary.
// Note: Won't check header and expect the program makes sense there.
if (strstr($content, "\r\n$boundary") !== false) {
throw new \Exception('Error: data contains the multipart boundary');
}
$data .= "--{$boundary}\r\n";
$data .= implode("\r\n", $header) . "\r\n\r\n" . $content . "\r\n";
}
// signal end of request (note the trailing "--")
$data .= "--{$boundary}--\r\n";
return [$boundary, $data];
}
// build data for a multipart/form-data request
list($boundary, $data) = wrap_multipart_data(generate_multipart_data_parts(
// normal form variables
[
'hello' => 'world',
'foo' => 'bar',
],
// files
[
'upload_file' => 'path/to/your/file.xlsx',
]
));
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: multipart/form-data; boundary={$boundary}\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
您的 Python 脚本应该接收作为普通 HTTP 表单文件上传的文件(文件字段名为“upload_file”)。使用您的框架支持的方法从请求中获取文件。
方法 2:非常长的 x-www-form-urlencoded 值
如果您担心二进制安全性,或者如果它以某种方式失败,另一种方法是将文件作为 base64 编码的字符串提交:
<?php
$file_data = file_get_contents('/some');
$data = urlencode([
'upload_file' => base64_encode('path/to/your/file.xlsx'),
]);
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
您将在名为 "upload_file"
的字段上以 base64 编码的字符串形式在您的 Python 服务器上获取文件数据。您需要解码才能得到原始二进制内容。
方法 3:如果您坚持...
如果您坚持当前的 2 次往返方法,简单的解决方案是拥有 2 个不同的端点:
- 一个用于向您的 Python 应用程序发送 POST 请求。
- 一个用于提供 xlsx 文件,对 Python 应用程序没有任何要求。
根据您的描述,您的死锁就在那里,因为您为此目的使用了相同的脚本。我看不出为什么它们不能是 2 个独立的脚本/路由控制器的原因。
我在 Laravel PHP 服务器上有一个 Web 应用程序 运行。对于某些需求(Word 文档处理),我实现了一个 Python 服务器来进行数据提取。我想知道如何通过向 PHP 传递文件来调用我的 Python 服务器。 目前,我将 docx 文件保存在 PHP 服务器上,可通过 url 访问。我使用 URL 从 PHP 服务器向 Python 服务器发出 http POST 请求以下载文档。问题是我遇到了死锁,因为 PHP 服务器正在等待 Python 服务器的响应,而 Python 服务器正在等待 PHP 服务器下载文档。关于如何解决这个问题有什么建议吗?
这里是 PHP 代码:
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
这里是 Python 代码:
@app.route('/api/extraction', methods=['post'])
def extraction():
data = request.form.to_dict()
url = data['file'] # get url
filename = secure_filename(url.rsplit('/', 1)[-1])
path = os.path.join(app.config['UPLOAD_FILE_FOLDER'], filename)
urllib.request.urlretrieve(url, path)
您应该通过适当的 POST (multipart/form) 请求发送文件,而不是让 Python 获取数据。它比您当前的 2 次往返方法更难调试和维护。
方法一:普通表单请求
<?php
/**
* A genertor that yields multipart form-data fragments (without the ending EOL).
* Would encode all files with base64 to make the request binary-safe.
*
* @param iterable $vars
* Key-value iterable (e.g. assoc array) of string or integer.
* Keys represents the field name.
* @param iterable $files
* Key-value iterable (e.g. assoc array) of file path string.
* Keys represents the field name of file upload.
*
* @return \Generator
* Generator of multipart form-data fragments (without the ending EOL) in array format,
* always contains 2 values:
* 0 - An array of header for a key-value pair.
* 1 - A value string (can contain binary content) of the key-value pair.
*/
function generate_multipart_data_parts(iterable $vars, iterable $files=[]): Generator {
// handle normal variables
foreach ($vars as $name => $value) {
$name = urlencode($name);
$value = urlencode($value);
yield [
// header
["Content-Disposition: form-data; name=\"{$name}\""],
// value
$value,
];
}
// handle file contents
foreach ($files as $file_fieldname => $file_path) {
$file_fieldname = urlencode($file_fieldname);
$file_data = file_get_contents($file_path);
yield [
// header
[
"Content-Disposition: form-data; name=\"{$file_fieldname}\"; filename=\"".basename($file_path)."\"",
"Content-Type: application/octet-stream", // for binary safety
],
// value
$file_data
];
}
}
/**
* Converts output of generate_multipart_data_parts() into form data.
*
* @param iterable $parts
* An iterator of form fragment arrays. See return data of
* generate_multipart_data_parts().
* @param string|null $boundary
* An optional pre-generated boundary string to use for wrapping data.
* Please reference section 7.2 "The Multipart Content-Type" in RFC1341.
*
* @return array
* An array with 2 items:
* 0 - string boundary
* 1 - string (can container binary data) data
*/
function wrap_multipart_data(iterable $parts, ?string $boundary = null): array {
if (empty($boundary)) {
$boundary = '-----------------------------------------boundary' . time();
}
$data = '';
foreach ($parts as $part) {
list($header, $content) = $part;
// Check content for boundary.
// Note: Won't check header and expect the program makes sense there.
if (strstr($content, "\r\n$boundary") !== false) {
throw new \Exception('Error: data contains the multipart boundary');
}
$data .= "--{$boundary}\r\n";
$data .= implode("\r\n", $header) . "\r\n\r\n" . $content . "\r\n";
}
// signal end of request (note the trailing "--")
$data .= "--{$boundary}--\r\n";
return [$boundary, $data];
}
// build data for a multipart/form-data request
list($boundary, $data) = wrap_multipart_data(generate_multipart_data_parts(
// normal form variables
[
'hello' => 'world',
'foo' => 'bar',
],
// files
[
'upload_file' => 'path/to/your/file.xlsx',
]
));
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: multipart/form-data; boundary={$boundary}\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
您的 Python 脚本应该接收作为普通 HTTP 表单文件上传的文件(文件字段名为“upload_file”)。使用您的框架支持的方法从请求中获取文件。
方法 2:非常长的 x-www-form-urlencoded 值
如果您担心二进制安全性,或者如果它以某种方式失败,另一种方法是将文件作为 base64 编码的字符串提交:
<?php
$file_data = file_get_contents('/some');
$data = urlencode([
'upload_file' => base64_encode('path/to/your/file.xlsx'),
]);
// Send POST REQUEST
$context_options = array(
'http' => array(
'method' => 'POST',
'header' => "Content-type: application/x-www-form-urlencoded\r\n"
. "Content-Length: " . strlen($data) . "\r\n",
'content' => $data,
'timeout' => 10,
)
);
$context = stream_context_create($context_options);
$result = fopen('http://localhost:5000/api/extraction','r', false, $context);
您将在名为 "upload_file"
的字段上以 base64 编码的字符串形式在您的 Python 服务器上获取文件数据。您需要解码才能得到原始二进制内容。
方法 3:如果您坚持...
如果您坚持当前的 2 次往返方法,简单的解决方案是拥有 2 个不同的端点:
- 一个用于向您的 Python 应用程序发送 POST 请求。
- 一个用于提供 xlsx 文件,对 Python 应用程序没有任何要求。
根据您的描述,您的死锁就在那里,因为您为此目的使用了相同的脚本。我看不出为什么它们不能是 2 个独立的脚本/路由控制器的原因。