如何使用 php 解析 <a name 和 <image src= inside <li 标签?
How to parse <a name and <image src= inside <li tag using php?
我得到了一个包含很多 <li> .. </li>
集的 html 字符串。我想解析每组 <li> ...</li>
中的以下数据:
1: call.php?category=fruits&fruitid=123456
2: mango season
3: http://imagehosting.com/images/fru_123456.png
我用preg_match_all得到了第一个值,但是如何得到第二个和第三个值呢?
如果有人告诉我得到第二和第三项,我会很高兴。提前致谢。
php:
preg_match_all('/getit(.*?)detailFruit/', $code2, $match);
var_dump($match);
// iterate the new array
for($i = 0; $i < count($match[0]); $i++)
{
$code3=str_replace('getit(\'', '', $match[0]);
$code4=str_replace('&\',detailFruit', '', $code3);
echo "<br>".$code4[$i];
}
样本<li> ..</li>
数据:
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID('123456');getit('call.php?category=fruits&fruitid=123456&',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png">
</li>
编辑: 我使用了 DOM 现在我得到了 2 和 3 值 如何使用 DOM 获得第一个值?
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($code2);
$xpath = new DOMXPath($dom);
// Empty array to hold all links to return
$result = array();
//Loop through each <li> tag in the dom
foreach($dom->getElementsByTagName('li') as $li) {
//Loop through each <a> tag within the li, then extract the node value
foreach($li->getElementsByTagName('a') as $links){
$result[] = $links->nodeValue;
echo $result[0] . "\n";
}
$imgs = $xpath->query("//li/img/@src");
foreach ($imgs as $img) {
echo $img->nodeValue . "\n";
}
}
有趣的问题 :-) 以下解决方案使用 DOMDocument/SimpleXML
的组合来轻松获得值 2 和 3。 DomDocument
已被使用,因为您的 HTML 片段已损坏。要从 JavaScript 内容中实际获取 link(值 1),使用了一个简单的正则表达式:
~getit\('([^']+)'\)~
# search for getit( and a singlequote literally
# capture everything up to (but not including) a new single quote
# this is saved in the group 1
可以在下面找到完整的演练(显然 banana
部分是我编的):
<?php
$html = '<ul>
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID(\'123456\');getit(\'call.php?category=fruits&fruitid=123456&\',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png"></li>
<li><a id="FR7890" onclick="setFood(false);setSeasonFruitID(\'7890\');getit(\'call.php?category=fruits&fruitid=7890&\',detailFruit,false);">bananas</a><img src="http://imagehosting.com/images/fru_7890.png"></li>
</ul>';
$dom = new DOMDocument;
$dom->strictErrorChecking = FALSE;
$dom->loadHTML($html);
$xml = simplexml_import_dom($dom);
# xpath to find list items
$items = $xml->xpath("//ul/li");
$regex = "~getit\('([^']+)'\)~";
# loop over the items
foreach ($items as $item) {
$title = $item->a->__toString();
$imgLink = $item->img["src"];
$jsLink = $item->a["onclick"];
preg_match_all($regex, $jsLink, $matches);
$jsLink = $matches[1][0];
echo "Title: $title, imgLink: $imgLink, jsLink: $jsLink\n";
// output: Title: mango season, imgLink: http://imagehosting.com/images/fru_123456.png, jsLink: call.php?category=fruits&fruitid=123456&
// Title: bananas, imgLink: http://imagehosting.com/images/fru_7890.png, jsLink: call.php?category=fruits&fruitid=7890&
}
?>
我得到了一个包含很多 <li> .. </li>
集的 html 字符串。我想解析每组 <li> ...</li>
中的以下数据:
1: call.php?category=fruits&fruitid=123456
2: mango season
3: http://imagehosting.com/images/fru_123456.png
我用preg_match_all得到了第一个值,但是如何得到第二个和第三个值呢? 如果有人告诉我得到第二和第三项,我会很高兴。提前致谢。
php:
preg_match_all('/getit(.*?)detailFruit/', $code2, $match);
var_dump($match);
// iterate the new array
for($i = 0; $i < count($match[0]); $i++)
{
$code3=str_replace('getit(\'', '', $match[0]);
$code4=str_replace('&\',detailFruit', '', $code3);
echo "<br>".$code4[$i];
}
样本<li> ..</li>
数据:
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID('123456');getit('call.php?category=fruits&fruitid=123456&',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png">
</li>
编辑: 我使用了 DOM 现在我得到了 2 和 3 值 如何使用 DOM 获得第一个值?
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($code2);
$xpath = new DOMXPath($dom);
// Empty array to hold all links to return
$result = array();
//Loop through each <li> tag in the dom
foreach($dom->getElementsByTagName('li') as $li) {
//Loop through each <a> tag within the li, then extract the node value
foreach($li->getElementsByTagName('a') as $links){
$result[] = $links->nodeValue;
echo $result[0] . "\n";
}
$imgs = $xpath->query("//li/img/@src");
foreach ($imgs as $img) {
echo $img->nodeValue . "\n";
}
}
有趣的问题 :-) 以下解决方案使用 DOMDocument/SimpleXML
的组合来轻松获得值 2 和 3。 DomDocument
已被使用,因为您的 HTML 片段已损坏。要从 JavaScript 内容中实际获取 link(值 1),使用了一个简单的正则表达式:
~getit\('([^']+)'\)~
# search for getit( and a singlequote literally
# capture everything up to (but not including) a new single quote
# this is saved in the group 1
可以在下面找到完整的演练(显然 banana
部分是我编的):
<?php
$html = '<ul>
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID(\'123456\');getit(\'call.php?category=fruits&fruitid=123456&\',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png"></li>
<li><a id="FR7890" onclick="setFood(false);setSeasonFruitID(\'7890\');getit(\'call.php?category=fruits&fruitid=7890&\',detailFruit,false);">bananas</a><img src="http://imagehosting.com/images/fru_7890.png"></li>
</ul>';
$dom = new DOMDocument;
$dom->strictErrorChecking = FALSE;
$dom->loadHTML($html);
$xml = simplexml_import_dom($dom);
# xpath to find list items
$items = $xml->xpath("//ul/li");
$regex = "~getit\('([^']+)'\)~";
# loop over the items
foreach ($items as $item) {
$title = $item->a->__toString();
$imgLink = $item->img["src"];
$jsLink = $item->a["onclick"];
preg_match_all($regex, $jsLink, $matches);
$jsLink = $matches[1][0];
echo "Title: $title, imgLink: $imgLink, jsLink: $jsLink\n";
// output: Title: mango season, imgLink: http://imagehosting.com/images/fru_123456.png, jsLink: call.php?category=fruits&fruitid=123456&
// Title: bananas, imgLink: http://imagehosting.com/images/fru_7890.png, jsLink: call.php?category=fruits&fruitid=7890&
}
?>