如果文件大于给定大小,则阻止从远程源加载
Prevent loading from remote source if file is larger than a given size
假设我想从远程服务器加载 XML 最大 10MB 的文件。
类似
$xml_file = "http://example.com/largeXML.xml";// size= 500MB
//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB
/*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/
$dom = new DOMDocument();
$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);
这可能如何实现?....任何想法或替代方案?或实现这一目标的最佳方法将不胜感激。
我检查了 PHP: Remote file size without downloading file 但是当我尝试
var_dump(
curl_get_file_size(
"http://www.dailymotion.com/rss/user/dialhainaut/"
)
);
我得到string 'unknown' (length=7)
当我按照下面的建议尝试使用 get_headers
时,Content-Length 在 headers 中丢失,因此这也无法可靠地工作。
请告知如何确定length
,如果超过10MB
,请避免将其发送到DOMDocument
编辑:新答案有点变通:
您无法检查 Dom 元素长度,但是,您可以发出 header 请求并从 URL:
获取文件大小
<?php
function i_hope_this_works( $XmlUrl ) {
//lets assume we fk up so we set size to -1
$size = -1;
$request = curl_init( $XmlUrl );
// Go for a head request, so the body of a 1 gb file will take the same as 1 kb
curl_setopt( $request, CURLOPT_NOBODY, true );
curl_setopt( $request, CURLOPT_HEADER, true );
curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );
$requesteddata = curl_exec( $request );
curl_close( $request );
if( $requesteddata ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
$content_length = (int)$matches[1];
}
// you can google status qoutes 200 is Ok for example
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
您现在应该能够通过 URL 获得您想要的每个文件大小,只需
$file_size = i_hope_this_works('yourURLasString')
10MB 等于 10485760 B。如果不指定内容长度,它将使用自 php5 以来可用的 curl。我从 SO 的某个地方得到了这个来源,但不记得了。:
function get_filesize($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length'])) return $headers['Content-Length'];
if (isset($headers['Content-length'])) return $headers['Content-length'];
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3)
Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
}
}
$filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
if($filesize<=10485760){
echo 'Fine';
}else{
echo $filesize.'File is too big';
}
。
好的,终于工作了。 headers 解决方案显然不会广泛适用。在此解决方案中,我们打开一个文件句柄并逐行读取 XML,直到它达到 $max_B 的阈值。如果文件太大,我们仍然有读取它直到 10MB 标记的开销,但它按预期工作。如果文件小于 $max_B,则继续...
$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";
$fh = fopen($xml_file, "r");
if($fh){
$file_string = '';
$total_B = 0;
$max_B = 10485760;
//run through lines of the file, concatenating them into a string
while (!feof($fh)){
if($line = fgets($fh)){
$total_B += strlen($line);
if($total_B < $max_B){
$file_string .= $line;
} else {
break;
}
}
}
if($total_B < $max_B){
echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
//proceed
$dom = new DOMDocument();
$dom->loadXML($file_string); //NOTE the method change because we're loading from a string
} else {
//reject
echo 'File too big! Max size = '.$max_B.' bytes.';
}
fclose($fh);
} else {
echo '404 file not found!';
}
假设我想从远程服务器加载 XML 最大 10MB 的文件。
类似
$xml_file = "http://example.com/largeXML.xml";// size= 500MB
//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB
/*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/
$dom = new DOMDocument();
$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);
这可能如何实现?....任何想法或替代方案?或实现这一目标的最佳方法将不胜感激。
我检查了 PHP: Remote file size without downloading file 但是当我尝试
var_dump(
curl_get_file_size(
"http://www.dailymotion.com/rss/user/dialhainaut/"
)
);
我得到string 'unknown' (length=7)
当我按照下面的建议尝试使用 get_headers
时,Content-Length 在 headers 中丢失,因此这也无法可靠地工作。
请告知如何确定length
,如果超过10MB
DOMDocument
编辑:新答案有点变通:
您无法检查 Dom 元素长度,但是,您可以发出 header 请求并从 URL:
<?php
function i_hope_this_works( $XmlUrl ) {
//lets assume we fk up so we set size to -1
$size = -1;
$request = curl_init( $XmlUrl );
// Go for a head request, so the body of a 1 gb file will take the same as 1 kb
curl_setopt( $request, CURLOPT_NOBODY, true );
curl_setopt( $request, CURLOPT_HEADER, true );
curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );
$requesteddata = curl_exec( $request );
curl_close( $request );
if( $requesteddata ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
$content_length = (int)$matches[1];
}
// you can google status qoutes 200 is Ok for example
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
您现在应该能够通过 URL 获得您想要的每个文件大小,只需
$file_size = i_hope_this_works('yourURLasString')
10MB 等于 10485760 B。如果不指定内容长度,它将使用自 php5 以来可用的 curl。我从 SO 的某个地方得到了这个来源,但不记得了。:
function get_filesize($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length'])) return $headers['Content-Length'];
if (isset($headers['Content-length'])) return $headers['Content-length'];
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3)
Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
}
}
$filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
if($filesize<=10485760){
echo 'Fine';
}else{
echo $filesize.'File is too big';
}
。
好的,终于工作了。 headers 解决方案显然不会广泛适用。在此解决方案中,我们打开一个文件句柄并逐行读取 XML,直到它达到 $max_B 的阈值。如果文件太大,我们仍然有读取它直到 10MB 标记的开销,但它按预期工作。如果文件小于 $max_B,则继续...
$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";
$fh = fopen($xml_file, "r");
if($fh){
$file_string = '';
$total_B = 0;
$max_B = 10485760;
//run through lines of the file, concatenating them into a string
while (!feof($fh)){
if($line = fgets($fh)){
$total_B += strlen($line);
if($total_B < $max_B){
$file_string .= $line;
} else {
break;
}
}
}
if($total_B < $max_B){
echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
//proceed
$dom = new DOMDocument();
$dom->loadXML($file_string); //NOTE the method change because we're loading from a string
} else {
//reject
echo 'File too big! Max size = '.$max_B.' bytes.';
}
fclose($fh);
} else {
echo '404 file not found!';
}