如果文件大于给定大小,则阻止从远程源加载

Prevent loading from remote source if file is larger than a given size

假设我想从远程服务器加载 XML 最大 10MB 的文件。

类似

$xml_file = "http://example.com/largeXML.xml";// size= 500MB

//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB

 /*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/

$dom =  new DOMDocument();

$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);

可能如何实现?....任何想法或替代方案?或实现这一目标的最佳方法将不胜感激。

我检查了 PHP: Remote file size without downloading file 但是当我尝试

var_dump(
    curl_get_file_size(
        "http://www.dailymotion.com/rss/user/dialhainaut/"
    )
);

我得到string 'unknown' (length=7)

当我按照下面的建议尝试使用 get_headers 时,Content-Length 在 headers 中丢失,因此这也无法可靠地工作。

请告知如何确定length,如果超过10MB

,请避免将其发送到DOMDocument

编辑:新答案有点变通:
您无法检查 Dom 元素长度,但是,您可以发出 header 请求并从 URL:

获取文件大小
<?php

function i_hope_this_works( $XmlUrl ) {
    //lets assume we fk up so we set size to -1  
    $size = -1;

      $request = curl_init( $XmlUrl );

      // Go for a head request, so the body of a 1 gb file will take the same as 1 kb
      curl_setopt( $request, CURLOPT_NOBODY, true );
      curl_setopt( $request, CURLOPT_HEADER, true );
      curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
      curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
      curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );

      $requesteddata = curl_exec( $request );
      curl_close( $request );

      if( $requesteddata ) {
        $content_length = "unknown";
        $status = "unknown";

        if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
          $status = (int)$matches[1];
        }

        if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
          $content_length = (int)$matches[1];
        }

        // you can google status qoutes 200 is Ok for example
        if( $status == 200 || ($status > 300 && $status <= 308) ) {
          $result = $content_length;
        }
      }

      return $result;
    }
    ?>

您现在应该能够通过 URL 获得您想要的每个文件大小,只需

$file_size = i_hope_this_works('yourURLasString')

10MB 等于 10485760 B。如果不指定内容长度,它将使用自 php5 以来可用的 curl。我从 SO 的某个地方得到了这个来源,但不记得了。:

function get_filesize($url) {
    $headers = get_headers($url, 1);
    if (isset($headers['Content-Length'])) return $headers['Content-Length'];
    if (isset($headers['Content-length'])) return $headers['Content-length'];
    $c = curl_init();
    curl_setopt_array($c, array(
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 
         (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) 
          Gecko/20090824 Firefox/3.5.3'),
        ));
    curl_exec($c);
    return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
    }
}
    $filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
    if($filesize<=10485760){
        echo 'Fine';
    }else{
       echo $filesize.'File is too big';
    }    

Check demo here

好的,终于工作了。 headers 解决方案显然不会广泛适用。在此解决方案中,我们打开一个文件句柄并逐行读取 XML,直到它达到 $max_B 的阈值。如果文件太大,我们仍然有读取它直到 10MB 标记的开销,但它按预期工作。如果文件小于 $max_B,则继续...

$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";

$fh = fopen($xml_file, "r");  

if($fh){
    $file_string = '';
    $total_B = 0;
    $max_B = 10485760;
    //run through lines of the file, concatenating them into a string
    while (!feof($fh)){
        if($line = fgets($fh)){
            $total_B += strlen($line);
            if($total_B < $max_B){
                $file_string .= $line;
            } else {
                break;
            }
        }
    } 

    if($total_B < $max_B){
        echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
        //proceed
        $dom = new DOMDocument();
        $dom->loadXML($file_string); //NOTE the method change because we're loading from a string   

    } else {
        //reject
        echo 'File too big! Max size = '.$max_B.' bytes.';  
    }

    fclose($fh);

} else {
    echo '404 file not found!';
}