在 PHP Goutte 中刮取 <script> 标签
Scrape <script> tag in PHP Goutte
我正在使用 PHP Goutte 抓取网站,但我需要一些仅在脚本标记中以下列方式指示的信息:
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
我需要每一个的src,可以吗?
您可以使用正则表达式。
例子
$page_content = <<<EOF
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
EOF;
preg_match_all('/src:\s"(.*)"/', $page_content, $match);
$result = $match[1];
print_r($result);
输出
Array
(
[0] => xxx.example.com
[1] => xxx.example.com
[2] => xxx.example.com
[3] => xxx.example.com4
[4] => xxx.example.com
[5] => xxx.example.com
)
我正在使用 PHP Goutte 抓取网站,但我需要一些仅在脚本标记中以下列方式指示的信息:
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
我需要每一个的src,可以吗?
您可以使用正则表达式。
例子
$page_content = <<<EOF
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
EOF;
preg_match_all('/src:\s"(.*)"/', $page_content, $match);
$result = $match[1];
print_r($result);
输出
Array
(
[0] => xxx.example.com
[1] => xxx.example.com
[2] => xxx.example.com
[3] => xxx.example.com4
[4] => xxx.example.com
[5] => xxx.example.com
)