将 HTML 源加载到 PHP 中的字符串

Load HTML Source to String in PHP

我正在尝试将远程页面的 HTML 源加载到 PHP 中的字符串中,以这个很棒的 Galantis 音乐视频 https://www.youtube.com/watch?v=5XR7naZ_zZA 为例。

然后我想在源代码中搜索特定的 div id "action-panel-details" 并确认何时找到它。使用下面的代码,整个页面只需加载到我服务器上 运行 的页面上。

这甚至可以用 file_get_contents() 实现吗?这是加载页面、视频和所有内容的代码:

<?php

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

if(preg_match("~action-panel-details~", $str)){
echo "it's there";
}

?>

我也试过使用 simplexml_load_file() 并以这个错误结束:

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5

Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5

Warning: simplexml_load_string(): ^ in /page.php on line 5

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5

这是生成的代码:

<?php

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

$str = simplexml_load_string($str);

if(preg_match("~watch-time-text~", $str)){
echo "it's there";
}

?>

非常感谢任何帮助。

可能使用 curl:

//$url = 'https://www.youtube.com/';
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$content = curl_exec($ch);
curl_close($ch);

if(preg_match("~watch-time-text~", $content)){
    echo "it's there";
}else{
    echo 'is another page';
}

print document code:
echo "<pre>".htmlentities($content)."<pre>";
//
match whit html code in 'watch-time-text':
<div id="action-panel-details" class="action-panel-content yt-uix-expander 
yt-uix-expander-collapsed yt-card yt-card-has-padding">
<div id="watch-description" class="yt-uix-button-panel">
<div id="watch-description-content">
<div id="watch-description-clip"><span id="watch-description-badges"></span>
<div id="watch-uploader-info"><strong class="watch-time-text">

是的,你非常接近。基本上,由于页面代码是 HTML 而不是 XML.

,所以只需将您尝试将其加载到 XML 中的部分废弃即可
$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

if(preg_match("~watch-time-text~", $str)){
    print "Match was found!";
}
else {
    print "No match was found. :(";
}

这将显示:

Match was found!

很遗憾,我无法向您展示演示,因为 ideone.comcodepad.org 不允许我使用 file_get_contents,但这可以在我自己的服务器上使用。

如果你 运行 遇到像我一样不允许 file_get_contents 的情况,你可以按照 miglio 所说的那样做,并使用 cURL 获取远程源。但其余的都是一样的:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$str = curl_exec($ch);
curl_close($ch);


if(preg_match("~watch-time-text~", $str)){
    print "Match was found!";
}
else {
    print "No match was found. :(";
}