将 HTML 源加载到 PHP 中的字符串

Question

我正在尝试将远程页面的 HTML 源加载到 PHP 中的字符串中，以这个很棒的 Galantis 音乐视频 https://www.youtube.com/watch?v=5XR7naZ_zZA 为例。

然后我想在源代码中搜索特定的 div id "action-panel-details" 并确认何时找到它。使用下面的代码，整个页面只需加载到我服务器上运行的页面上。

这甚至可以用 file_get_contents() 实现吗？这是加载页面、视频和所有内容的代码：

<?php

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

if(preg_match("~action-panel-details~", $str)){
echo "it's there";
}

?>

我也试过使用 simplexml_load_file() 并以这个错误结束：

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5

Warning: simplexml_load_string(): ndow, document);</script><script>var ytcfg = {d: function() {return (window.yt & in /page.php on line 5

Warning: simplexml_load_string(): ^ in /page.php on line 5

Warning: simplexml_load_string(): Entity: line 1: parser error : xmlParseEntityRef: no name in /page.php on line 5

这是生成的代码：

<?php

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

$str = simplexml_load_string($str);

if(preg_match("~watch-time-text~", $str)){
echo "it's there";
}

?>

非常感谢任何帮助。

Answer 1

可能使用 curl：

//$url = 'https://www.youtube.com/';
$url = "https://www.youtube.com/watch?v=5XR7naZ_zZA";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$content = curl_exec($ch);
curl_close($ch);

if(preg_match("~watch-time-text~", $content)){
    echo "it's there";
}else{
    echo 'is another page';
}

print document code:
echo "<pre>".htmlentities($content)."<pre>";
//
match whit html code in 'watch-time-text':
<div id="action-panel-details" class="action-panel-content yt-uix-expander 
yt-uix-expander-collapsed yt-card yt-card-has-padding">
<div id="watch-description" class="yt-uix-button-panel">
<div id="watch-description-content">
<div id="watch-description-clip"><span id="watch-description-badges"></span>
<div id="watch-uploader-info"><strong class="watch-time-text">

Answer 2

是的，你非常接近。基本上，由于页面代码是 HTML 而不是 XML.

，所以只需将您尝试将其加载到 XML 中的部分废弃即可

$str = file_get_contents('https://www.youtube.com/watch?v=5XR7naZ_zZA');

if(preg_match("~watch-time-text~", $str)){
    print "Match was found!";
}
else {
    print "No match was found. :(";
}

这将显示：

Match was found!

很遗憾，我无法向您展示演示，因为 ideone.com 和 codepad.org 不允许我使用 file_get_contents，但这可以在我自己的服务器上使用。

如果你运行遇到像我一样不允许 file_get_contents 的情况，你可以按照 miglio 所说的那样做，并使用 cURL 获取远程源。但其余的都是一样的：

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.youtube.com/watch?v=5XR7naZ_zZA');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$str = curl_exec($ch);
curl_close($ch);


if(preg_match("~watch-time-text~", $str)){
    print "Match was found!";
}
else {
    print "No match was found. :(";
}

将 HTML 源加载到 PHP 中的字符串

Load HTML Source to String in PHP

php

simplexml

file-get-contents

preg-match