从 API 中拉取数据,内存增长
Pulling data from API, memory growth
我正在做一个项目,我从 API 中提取数据 (JSON)。我遇到的问题是内存在缓慢增长,直到出现可怕的致命错误:
Fatal error: Allowed memory size of * bytes exhausted (tried
to allocate * bytes) in C:... on line *
我认为不应该有任何内存增长。我尝试在循环结束时取消设置所有内容,但没有区别。所以我的问题是:我做错了什么吗?正常吗?我该怎么做才能解决这个问题?
<?php
$start = microtime(true);
$time = microtime(true) - $start;
echo "Start: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
include ('start.php');
include ('connect.php');
set_time_limit(0);
$api_key = 'API-KEY';
$tier = 'Platinum';
$threads = 10; //number of urls called simultaneously
function multiRequest($urls, $start) {
$time = microtime(true) - $start;
echo " start function: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
$nbrURLS = count($urls); // number of urls in array $urls
$ch = array(); // array of curl handles
$result = array(); // data to be returned
$mh = curl_multi_init(); // create a multi handle
$time = microtime(true) - $start;
echo " Creation multi handle: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// set URL and other appropriate options
for($i = 0; $i < $nbrURLS; $i++) {
$ch[$i]=curl_init();
curl_setopt($ch[$i], CURLOPT_URL, $urls[$i]);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1); // return data as string
curl_setopt($ch[$i], CURLOPT_SSL_VERIFYPEER, 0); // Doesn't verifies certificate
curl_multi_add_handle ($mh, $ch[$i]); // Add a normal cURL handle to a cURL multi handle
}
$time = microtime(true) - $start;
echo " For loop options: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
curl_multi_select($mh, 0.1); // without this, we will busy-loop here and use 100% CPU
} while ($active);
$time = microtime(true) - $start;
echo " Execution: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
echo ' For loop2<br>';
// get content and remove handles
for($i = 0; $i < $nbrURLS; $i++) {
$error = curl_getinfo($ch[$i], CURLINFO_HTTP_CODE); // Last received HTTP code
echo " error: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
//error handling if not 200 ok code
if($error != 200){
if($error == 429 || $error == 500 || $error == 503 || $error == 504){
echo "Again error: $error<br>";
$result['again'][] = $urls[$i];
} else {
echo "Error error: $error<br>";
$result['errors'][] = array("Url" => $urls[$i], "errornbr" => $error);
}
} else {
$result['json'][] = curl_multi_getcontent($ch[$i]);
echo " Content: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
}
curl_multi_remove_handle($mh, $ch[$i]);
curl_close($ch[$i]);
}
$time = microtime(true) - $start;
echo " after loop2: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
curl_multi_close($mh);
return $result;
}
$gamesId = mysqli_query($connect, "SELECT gameId FROM `games` WHERE `region` = 'EUW1' AND `tier` = '$tier ' LIMIT 20 ");
$urls = array();
while($result = mysqli_fetch_array($gamesId))
{
$urls[] = 'https://euw.api.pvp.net/api/lol/euw/v2.2/match/' . $result['gameId'] . '?includeTimeline=true&api_key=' . $api_key;
}
$time = microtime(true) - $start;
echo "After URL array: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$x = 1; //number of loops
while($urls){
$chunk = array_splice($urls, 0, $threads); // take the first chunk ($threads) of all urls
$time = microtime(true) - $start;
echo "<br>After chunk: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$result = multiRequest($chunk, $start); // Get json
unset($chunk);
$nbrComplete = count($result['json']); //number of retruned json strings
echo 'For loop: <br/>';
for($y = 0; $y < $nbrComplete; $y++){
// parse the json
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo " Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
}
unset($nbrComplete);
unset($decoded);
$time = microtime(true) - $start;
echo $x . ": ". memory_get_peak_usage(true) . " | " . $time . "<br>";
// reuse urls
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
sleep(15); // limit the request rate
$x++;
}
include ('end.php');
?>
PHP 版本 5.3.9 - 100 次循环:
loop: memory | time (sec)
1: 5505024 | 0.98330211639404
3: 6291456 | 33.190237045288
65: 6553600 | 1032.1401019096
73: 6815744 | 1160.4345710278
75: 7077888 | 1192.6274609566
100: 7077888 | 1595.2397520542
编辑:
在 windows 上使用 PHP 5.6.14 xampp 尝试后:
loop: memory | time (sec)
1: 5505024 | 1.0365679264069
3: 6291456 | 33.604479074478
60: 6553600 | 945.90159296989
62: 6815744 | 977.82566595078
93: 7077888 | 1474.5941500664
94: 7340032 | 1490.6698410511
100: 7340032 | 1587.2434458733
EDIT2:我只看到 json_decode
后内存增加
Start: 262144 | 135448
After URL array: 262144 | 151984
After chunk: 262144 | 152272
start function: 262144 | 152464
Creation multi handle: 262144 | 152816
For loop options: 262144 | 161424
Execution: 3145728 | 1943472
For loop2
error: 3145728 | 1943520
Content: 3145728 | 2095056
error: 3145728 | 1938952
Content: 3145728 | 2131992
error: 3145728 | 1938072
Content: 3145728 | 2135424
error: 3145728 | 1933288
Content: 3145728 | 2062312
error: 3145728 | 1928504
Content: 3145728 | 2124360
error: 3145728 | 1923720
Content: 3145728 | 2089768
error: 3145728 | 1918936
Content: 3145728 | 2100768
error: 3145728 | 1914152
Content: 3145728 | 2089272
error: 3145728 | 1909368
Content: 3145728 | 2067184
error: 3145728 | 1904616
Content: 3145728 | 2102976
after loop2: 3145728 | 1899824
For loop:
Decode: 3670016 | 2962208
Decode: 4980736 | 3241232
Decode: 5242880 | 3273808
Decode: 5242880 | 2802024
Decode: 5242880 | 3258152
Decode: 5242880 | 3057816
Decode: 5242880 | 3169160
Decode: 5242880 | 3122360
Decode: 5242880 | 3004216
Decode: 5242880 | 3277304
So my question is: am I doing something wrong? Is it normal? What can I do to fix this problem?
您的代码没有任何问题,因为这是正常行为,您正在从外部源请求数据,然后将其加载到内存中。
当然,您的问题的解决方案可能很简单:
ini_set('memory_limit', -1);
这允许使用所有需要的内存。
当我使用虚拟内容时,内存使用在请求之间保持不变。
这是在 Windows 上的 XAMPP 中使用 PHP 5.5.19。
有一个 cURL memory leak related bug 已在版本 5.5.4
中修复
你的方法很长,所以我不相信垃圾收集会在函数结束时才会被触发,这意味着你未使用的变量会堆积起来。如果它们不再被使用,那么垃圾回收会为您解决这个问题。
您可能会考虑将此代码重构为更小的方法以利用这一点,以及使用更小的方法带来的所有其他好东西,但是与此同时您可以尝试将 gc_collect_cycles();
放在循环的最后,看看是否可以释放一些内存:
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
gc_collect_cycles();//add this line here
sleep(15); // limit the request rate
编辑:我更新的段实际上不属于大函数,但是我怀疑 $result
的大小可能会把事情搞砸,并且在循环终止之前它不会被清理,可能.然而,这值得一试。
So my question is: am I doing something wrong? Is it normal? What can
I do to fix this problem?
是的,运行当您使用所有内存时,内存不足是正常的。您同时请求 10 个 HTTP 请求并将 JSON 响应反序列化到 PHP 内存中。如果不限制响应的大小,您将始终面临 运行内存不足的危险。
你还能做什么?
- 不要同时运行多个http连接。将
$threads
调低到 1 来测试这个。如果 C 扩展中存在内存泄漏,调用 gc_collect_cycles()
将不会释放任何内存,这只会影响 Zend 引擎中分配的不再可用的内存。
- 将结果保存到文件夹并在另一个脚本中处理它们。您可以将已处理的文件移动到子目录中以标记您已成功处理了一个 json 文件。
- 调查分叉或消息队列,让多个进程同时处理部分问题 - 多个 PHP 进程侦听队列存储桶或使用自己的父进程分叉子进程进程内存。
我在 10 个 URL 上测试了你的脚本。我删除了您的所有评论,除了脚本末尾的一条评论和使用时问题循环中的一条评论 json_decode。我还打开了一个你从 API 编码的页面,看起来数组很大,我认为你是对的,你在 json_decode.
中有问题
结果和修正。
没有变化的结果:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
结果:
Decode: 3407872 | 2947584
Decode: 3932160 | 2183872
Decode: 3932160 | 2491440
Decode: 4980736 | 3291288
Decode: 6291456 | 3835848
Decode: 6291456 | 2676760
Decode: 6291456 | 4249376
Decode: 6291456 | 2832080
Decode: 6291456 | 4081888
Decode: 6291456 | 3214112
Decode: 6291456 | 244400
结果 unset($decode)
:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
结果:
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3932160 | 1573296
Decode: 4456448 | 1573296
Decode: 4456448 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 244448
您还可以添加 gc_collect_cycles:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
gc_collect_cycles();
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
它在某些情况下可以为您提供帮助,但结果可能会导致性能下降。
您可以尝试使用 unset
和 unset+gc
重新启动脚本,如果您在更改后会遇到同样的问题,请写在前面。
另外我没看到你在哪里使用了$decoded
变量,如果代码有误,你可以去掉json_decode :)
我正在做一个项目,我从 API 中提取数据 (JSON)。我遇到的问题是内存在缓慢增长,直到出现可怕的致命错误:
Fatal error: Allowed memory size of * bytes exhausted (tried to allocate * bytes) in C:... on line *
我认为不应该有任何内存增长。我尝试在循环结束时取消设置所有内容,但没有区别。所以我的问题是:我做错了什么吗?正常吗?我该怎么做才能解决这个问题?
<?php
$start = microtime(true);
$time = microtime(true) - $start;
echo "Start: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
include ('start.php');
include ('connect.php');
set_time_limit(0);
$api_key = 'API-KEY';
$tier = 'Platinum';
$threads = 10; //number of urls called simultaneously
function multiRequest($urls, $start) {
$time = microtime(true) - $start;
echo " start function: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
$nbrURLS = count($urls); // number of urls in array $urls
$ch = array(); // array of curl handles
$result = array(); // data to be returned
$mh = curl_multi_init(); // create a multi handle
$time = microtime(true) - $start;
echo " Creation multi handle: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// set URL and other appropriate options
for($i = 0; $i < $nbrURLS; $i++) {
$ch[$i]=curl_init();
curl_setopt($ch[$i], CURLOPT_URL, $urls[$i]);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1); // return data as string
curl_setopt($ch[$i], CURLOPT_SSL_VERIFYPEER, 0); // Doesn't verifies certificate
curl_multi_add_handle ($mh, $ch[$i]); // Add a normal cURL handle to a cURL multi handle
}
$time = microtime(true) - $start;
echo " For loop options: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
curl_multi_select($mh, 0.1); // without this, we will busy-loop here and use 100% CPU
} while ($active);
$time = microtime(true) - $start;
echo " Execution: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
echo ' For loop2<br>';
// get content and remove handles
for($i = 0; $i < $nbrURLS; $i++) {
$error = curl_getinfo($ch[$i], CURLINFO_HTTP_CODE); // Last received HTTP code
echo " error: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
//error handling if not 200 ok code
if($error != 200){
if($error == 429 || $error == 500 || $error == 503 || $error == 504){
echo "Again error: $error<br>";
$result['again'][] = $urls[$i];
} else {
echo "Error error: $error<br>";
$result['errors'][] = array("Url" => $urls[$i], "errornbr" => $error);
}
} else {
$result['json'][] = curl_multi_getcontent($ch[$i]);
echo " Content: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
}
curl_multi_remove_handle($mh, $ch[$i]);
curl_close($ch[$i]);
}
$time = microtime(true) - $start;
echo " after loop2: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
curl_multi_close($mh);
return $result;
}
$gamesId = mysqli_query($connect, "SELECT gameId FROM `games` WHERE `region` = 'EUW1' AND `tier` = '$tier ' LIMIT 20 ");
$urls = array();
while($result = mysqli_fetch_array($gamesId))
{
$urls[] = 'https://euw.api.pvp.net/api/lol/euw/v2.2/match/' . $result['gameId'] . '?includeTimeline=true&api_key=' . $api_key;
}
$time = microtime(true) - $start;
echo "After URL array: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$x = 1; //number of loops
while($urls){
$chunk = array_splice($urls, 0, $threads); // take the first chunk ($threads) of all urls
$time = microtime(true) - $start;
echo "<br>After chunk: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$result = multiRequest($chunk, $start); // Get json
unset($chunk);
$nbrComplete = count($result['json']); //number of retruned json strings
echo 'For loop: <br/>';
for($y = 0; $y < $nbrComplete; $y++){
// parse the json
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo " Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
}
unset($nbrComplete);
unset($decoded);
$time = microtime(true) - $start;
echo $x . ": ". memory_get_peak_usage(true) . " | " . $time . "<br>";
// reuse urls
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
sleep(15); // limit the request rate
$x++;
}
include ('end.php');
?>
PHP 版本 5.3.9 - 100 次循环:
loop: memory | time (sec)
1: 5505024 | 0.98330211639404
3: 6291456 | 33.190237045288
65: 6553600 | 1032.1401019096
73: 6815744 | 1160.4345710278
75: 7077888 | 1192.6274609566
100: 7077888 | 1595.2397520542
编辑:
在 windows 上使用 PHP 5.6.14 xampp 尝试后:
loop: memory | time (sec)
1: 5505024 | 1.0365679264069
3: 6291456 | 33.604479074478
60: 6553600 | 945.90159296989
62: 6815744 | 977.82566595078
93: 7077888 | 1474.5941500664
94: 7340032 | 1490.6698410511
100: 7340032 | 1587.2434458733
EDIT2:我只看到 json_decode
Start: 262144 | 135448
After URL array: 262144 | 151984
After chunk: 262144 | 152272
start function: 262144 | 152464
Creation multi handle: 262144 | 152816
For loop options: 262144 | 161424
Execution: 3145728 | 1943472
For loop2
error: 3145728 | 1943520
Content: 3145728 | 2095056
error: 3145728 | 1938952
Content: 3145728 | 2131992
error: 3145728 | 1938072
Content: 3145728 | 2135424
error: 3145728 | 1933288
Content: 3145728 | 2062312
error: 3145728 | 1928504
Content: 3145728 | 2124360
error: 3145728 | 1923720
Content: 3145728 | 2089768
error: 3145728 | 1918936
Content: 3145728 | 2100768
error: 3145728 | 1914152
Content: 3145728 | 2089272
error: 3145728 | 1909368
Content: 3145728 | 2067184
error: 3145728 | 1904616
Content: 3145728 | 2102976
after loop2: 3145728 | 1899824
For loop:
Decode: 3670016 | 2962208
Decode: 4980736 | 3241232
Decode: 5242880 | 3273808
Decode: 5242880 | 2802024
Decode: 5242880 | 3258152
Decode: 5242880 | 3057816
Decode: 5242880 | 3169160
Decode: 5242880 | 3122360
Decode: 5242880 | 3004216
Decode: 5242880 | 3277304
So my question is: am I doing something wrong? Is it normal? What can I do to fix this problem?
您的代码没有任何问题,因为这是正常行为,您正在从外部源请求数据,然后将其加载到内存中。
当然,您的问题的解决方案可能很简单:
ini_set('memory_limit', -1);
这允许使用所有需要的内存。
当我使用虚拟内容时,内存使用在请求之间保持不变。
这是在 Windows 上的 XAMPP 中使用 PHP 5.5.19。
有一个 cURL memory leak related bug 已在版本 5.5.4
中修复你的方法很长,所以我不相信垃圾收集会在函数结束时才会被触发,这意味着你未使用的变量会堆积起来。如果它们不再被使用,那么垃圾回收会为您解决这个问题。
您可能会考虑将此代码重构为更小的方法以利用这一点,以及使用更小的方法带来的所有其他好东西,但是与此同时您可以尝试将 gc_collect_cycles();
放在循环的最后,看看是否可以释放一些内存:
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
gc_collect_cycles();//add this line here
sleep(15); // limit the request rate
编辑:我更新的段实际上不属于大函数,但是我怀疑 $result
的大小可能会把事情搞砸,并且在循环终止之前它不会被清理,可能.然而,这值得一试。
So my question is: am I doing something wrong? Is it normal? What can I do to fix this problem?
是的,运行当您使用所有内存时,内存不足是正常的。您同时请求 10 个 HTTP 请求并将 JSON 响应反序列化到 PHP 内存中。如果不限制响应的大小,您将始终面临 运行内存不足的危险。
你还能做什么?
- 不要同时运行多个http连接。将
$threads
调低到 1 来测试这个。如果 C 扩展中存在内存泄漏,调用gc_collect_cycles()
将不会释放任何内存,这只会影响 Zend 引擎中分配的不再可用的内存。 - 将结果保存到文件夹并在另一个脚本中处理它们。您可以将已处理的文件移动到子目录中以标记您已成功处理了一个 json 文件。
- 调查分叉或消息队列,让多个进程同时处理部分问题 - 多个 PHP 进程侦听队列存储桶或使用自己的父进程分叉子进程进程内存。
我在 10 个 URL 上测试了你的脚本。我删除了您的所有评论,除了脚本末尾的一条评论和使用时问题循环中的一条评论 json_decode。我还打开了一个你从 API 编码的页面,看起来数组很大,我认为你是对的,你在 json_decode.
中有问题结果和修正。
没有变化的结果:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
结果:
Decode: 3407872 | 2947584
Decode: 3932160 | 2183872
Decode: 3932160 | 2491440
Decode: 4980736 | 3291288
Decode: 6291456 | 3835848
Decode: 6291456 | 2676760
Decode: 6291456 | 4249376
Decode: 6291456 | 2832080
Decode: 6291456 | 4081888
Decode: 6291456 | 3214112
Decode: 6291456 | 244400
结果 unset($decode)
:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
结果:
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3932160 | 1573296
Decode: 4456448 | 1573296
Decode: 4456448 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 244448
您还可以添加 gc_collect_cycles:
代码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
gc_collect_cycles();
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
它在某些情况下可以为您提供帮助,但结果可能会导致性能下降。
您可以尝试使用 unset
和 unset+gc
重新启动脚本,如果您在更改后会遇到同样的问题,请写在前面。
另外我没看到你在哪里使用了$decoded
变量,如果代码有误,你可以去掉json_decode :)