如何从 div 中提取图像 URL?
How to extract an image URL from a div?
我想从 div 和 PHP 中提取背景图像 url。我想在字符串中搜索 class 并提取背景图像 url.
例如:
<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
这是我想要的输出:
https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
这应该有效:
preg_match_all('/background-image: url\((.*?)\)/', $your_html, $output, PREG_SET_ORDER);
更改最后一个参数,以便以您喜欢的形式(数组)获得输出:https://www.php.net/manual/en/function.preg-match-all.php
您可以直接使用正则表达式,但就我个人而言,我会使用 dom document/xpath 只删除您之后的元素,然后使用正则表达式从样式中提取值。
<?php
$html = '
<html><head></head><body>
<div class="single-post-image" style="background-image:url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg )"></div>
<div class="single-post-image" style="background-image: url( https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(\'https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg\')"></div>
<div class="single-post-image"></div>
</body>
</html>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DomXPath($dom);
$images = [];
foreach ($xpath->query("//*[contains(@class, 'single-post-image')]") as $img) {
if ($img->hasAttribute('style')) {
preg_match('/url\((.*)\)/', $img->getAttribute('style'), $match);
if (isset($match[1])) $images[] = trim($match[1], '\'" ');
}
}
print_r($images);
结果:
Array
(
[0] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[1] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[2] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[3] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
)
代码少了一点,但我相信它比对大量 HTML 文档进行正则表达式更健壮和高效。
我想从 div 和 PHP 中提取背景图像 url。我想在字符串中搜索 class 并提取背景图像 url.
例如:
<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
这是我想要的输出:
https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
这应该有效:
preg_match_all('/background-image: url\((.*?)\)/', $your_html, $output, PREG_SET_ORDER);
更改最后一个参数,以便以您喜欢的形式(数组)获得输出:https://www.php.net/manual/en/function.preg-match-all.php
您可以直接使用正则表达式,但就我个人而言,我会使用 dom document/xpath 只删除您之后的元素,然后使用正则表达式从样式中提取值。
<?php
$html = '
<html><head></head><body>
<div class="single-post-image" style="background-image:url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg )"></div>
<div class="single-post-image" style="background-image: url( https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(\'https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg\')"></div>
<div class="single-post-image"></div>
</body>
</html>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DomXPath($dom);
$images = [];
foreach ($xpath->query("//*[contains(@class, 'single-post-image')]") as $img) {
if ($img->hasAttribute('style')) {
preg_match('/url\((.*)\)/', $img->getAttribute('style'), $match);
if (isset($match[1])) $images[] = trim($match[1], '\'" ');
}
}
print_r($images);
结果:
Array
(
[0] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[1] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[2] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
[3] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
)
代码少了一点,但我相信它比对大量 HTML 文档进行正则表达式更健壮和高效。