将 HTML 源加载到 PHP 中的字符串
Load HTML Source to String in PHP
我正在尝试将远程页面的 HTML 源加载到 PHP 中的字符串中,以这个很棒的 Galantis 音乐视频 https://www.youtube.com/watch?v=5XR7naZ_zZA 为例。
然后我想在源代码中搜索特定的 div id "action-panel-details" 并确认何时找到它。使用下面的代码,整个页面只需加载到我服务器上 运行 的页面上。
这甚至可以用 file_get_contents() 实现吗?这是加载页面、视频和所有内容的代码:
<?php
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
if(preg_match("~action-panel-details~", $str)){
echo "it's there";
}
?>
我也试过使用 simplexml_load_file() 并以这个错误结束:
Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5
Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5
Warning: simplexml_load_string(): ^ in /page.php on line 5
Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5
这是生成的代码:
<?php
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
$str = simplexml_load_string($str);
if(preg_match("~watch-time-text~", $str)){
echo "it's there";
}
?>
非常感谢任何帮助。
可能使用 curl:
//$url = 'https://www.youtube.com/';
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$content = curl_exec($ch);
curl_close($ch);
if(preg_match("~watch-time-text~", $content)){
echo "it's there";
}else{
echo 'is another page';
}
print document code:
echo "<pre>".htmlentities($content)."<pre>";
//
match whit html code in 'watch-time-text':
<div id="action-panel-details" class="action-panel-content yt-uix-expander
yt-uix-expander-collapsed yt-card yt-card-has-padding">
<div id="watch-description" class="yt-uix-button-panel">
<div id="watch-description-content">
<div id="watch-description-clip"><span id="watch-description-badges"></span>
<div id="watch-uploader-info"><strong class="watch-time-text">
是的,你非常接近。基本上,由于页面代码是 HTML 而不是 XML.
,所以只需将您尝试将其加载到 XML 中的部分废弃即可
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
if(preg_match("~watch-time-text~", $str)){
print "Match was found!";
}
else {
print "No match was found. :(";
}
这将显示:
Match was found!
很遗憾,我无法向您展示演示,因为 ideone.com
和 codepad.org
不允许我使用 file_get_contents
,但这可以在我自己的服务器上使用。
如果你 运行 遇到像我一样不允许 file_get_contents
的情况,你可以按照 miglio 所说的那样做,并使用 cURL 获取远程源。但其余的都是一样的:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$str = curl_exec($ch);
curl_close($ch);
if(preg_match("~watch-time-text~", $str)){
print "Match was found!";
}
else {
print "No match was found. :(";
}
我正在尝试将远程页面的 HTML 源加载到 PHP 中的字符串中,以这个很棒的 Galantis 音乐视频 https://www.youtube.com/watch?v=5XR7naZ_zZA 为例。
然后我想在源代码中搜索特定的 div id "action-panel-details" 并确认何时找到它。使用下面的代码,整个页面只需加载到我服务器上 运行 的页面上。
这甚至可以用 file_get_contents() 实现吗?这是加载页面、视频和所有内容的代码:
<?php
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
if(preg_match("~action-panel-details~", $str)){
echo "it's there";
}
?>
我也试过使用 simplexml_load_file() 并以这个错误结束:
Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5
Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5
Warning: simplexml_load_string(): ^ in /page.php on line 5
Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5
这是生成的代码:
<?php
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
$str = simplexml_load_string($str);
if(preg_match("~watch-time-text~", $str)){
echo "it's there";
}
?>
非常感谢任何帮助。
可能使用 curl:
//$url = 'https://www.youtube.com/';
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$content = curl_exec($ch);
curl_close($ch);
if(preg_match("~watch-time-text~", $content)){
echo "it's there";
}else{
echo 'is another page';
}
print document code:
echo "<pre>".htmlentities($content)."<pre>";
//
match whit html code in 'watch-time-text':
<div id="action-panel-details" class="action-panel-content yt-uix-expander
yt-uix-expander-collapsed yt-card yt-card-has-padding">
<div id="watch-description" class="yt-uix-button-panel">
<div id="watch-description-content">
<div id="watch-description-clip"><span id="watch-description-badges"></span>
<div id="watch-uploader-info"><strong class="watch-time-text">
是的,你非常接近。基本上,由于页面代码是 HTML 而不是 XML.
,所以只需将您尝试将其加载到 XML 中的部分废弃即可$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');
if(preg_match("~watch-time-text~", $str)){
print "Match was found!";
}
else {
print "No match was found. :(";
}
这将显示:
Match was found!
很遗憾,我无法向您展示演示,因为 ideone.com
和 codepad.org
不允许我使用 file_get_contents
,但这可以在我自己的服务器上使用。
如果你 运行 遇到像我一样不允许 file_get_contents
的情况,你可以按照 miglio 所说的那样做,并使用 cURL 获取远程源。但其余的都是一样的:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$str = curl_exec($ch);
curl_close($ch);
if(preg_match("~watch-time-text~", $str)){
print "Match was found!";
}
else {
print "No match was found. :(";
}