使用 preg_match 捕获标签之间的文本,但 PHP 除外
Using preg_match to capture text between tags with exception with PHP
从 file_get_contents
我得到 url 的 HTML 代码。
$html = file_get_contents($url);
现在我想捕获 <span class="place ville">Ville : <span>
和 </span>
之间的城市名称。
HTML代码是:
<span class="place ville">Ville : <span>City name</span></span>
所以我正在使用这个:
preg_match('/<span class=\"place ville\">Ville : <span>(.+?)<\/span>/is', $html, $city);
$arr['city'] = $city[1];
有效。
但有时,代码如下 link:
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
在这种情况下,上面的代码不起作用。
你知道为什么吗?
谢谢。
这有点复杂,为此我们只需定义两个表达式并使用逻辑或连接它们 |
:
<span class="place ville">Ville : <span><.+?>(.+?)<\/
和
<span class="place ville">Ville : <span>([^<]+)?<
正则表达式
<span class="place ville">Ville : <span><.+?>(.+?)<\/|<span class="place ville">Ville : <span>([^<]+)?<
Demo
测试
$re = '/<span class="place ville">Ville : <span><.+?>(.+?)<\/|<span class="place ville">Ville : <span>([^<]+)?</m';
$str = '<span class="place ville">Ville : <span>City name</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
<span class="place ville">Ville : <span>Århus</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $key => $city) {
if ($city[1] == "") {
echo $city[2] . "\n";
} else {
echo $city[1] . "\n";
}
}
输出
City name
City name
Århus
City name
在这种情况下,另一种选择是使用 DOMDocument and for example use DOMXpath. Then from every DOMElement get the textcontent or the nodeValue:
$html = <<<HTML
<span class="place ville">Ville : <span>City name 1</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name 2</a></span></span>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$nodeList = $xpath->query("//span[contains(@class, 'place') and contains(@class, 'ville')]/span");
foreach ($nodeList as $n) {
echo $n->textContent . PHP_EOL;
}
结果
City name 1
City name 2
看到一个Php demo
从 file_get_contents
我得到 url 的 HTML 代码。
$html = file_get_contents($url);
现在我想捕获 <span class="place ville">Ville : <span>
和 </span>
之间的城市名称。
HTML代码是:
<span class="place ville">Ville : <span>City name</span></span>
所以我正在使用这个:
preg_match('/<span class=\"place ville\">Ville : <span>(.+?)<\/span>/is', $html, $city);
$arr['city'] = $city[1];
有效。
但有时,代码如下 link:
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
在这种情况下,上面的代码不起作用。
你知道为什么吗?
谢谢。
这有点复杂,为此我们只需定义两个表达式并使用逻辑或连接它们 |
:
<span class="place ville">Ville : <span><.+?>(.+?)<\/
和
<span class="place ville">Ville : <span>([^<]+)?<
正则表达式
<span class="place ville">Ville : <span><.+?>(.+?)<\/|<span class="place ville">Ville : <span>([^<]+)?<
Demo
测试
$re = '/<span class="place ville">Ville : <span><.+?>(.+?)<\/|<span class="place ville">Ville : <span>([^<]+)?</m';
$str = '<span class="place ville">Ville : <span>City name</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
<span class="place ville">Ville : <span>Århus</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name</a></span></span>
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $key => $city) {
if ($city[1] == "") {
echo $city[2] . "\n";
} else {
echo $city[1] . "\n";
}
}
输出
City name
City name
Århus
City name
在这种情况下,另一种选择是使用 DOMDocument and for example use DOMXpath. Then from every DOMElement get the textcontent or the nodeValue:
$html = <<<HTML
<span class="place ville">Ville : <span>City name 1</span></span>
<span class="place ville">Ville : <span><a href="https://example.com">City name 2</a></span></span>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$nodeList = $xpath->query("//span[contains(@class, 'place') and contains(@class, 'ville')]/span");
foreach ($nodeList as $n) {
echo $n->textContent . PHP_EOL;
}
结果
City name 1
City name 2
看到一个Php demo