我想找到所有 A Href 并获得他们的链接和完整内容
I want to find all A Href and get their Links and their full Content
我的问题是,我想从大型 HTML 代码中获取此内容:
所有其他包含 href 标签的标签都不应该可见!
<a href="/admin/home" torero-icon="home">Home</a>
Here i want to get first of all "/admin/home" and as second the Whole a Tag "< a href="/admin/home" torero-icon="home">Home"
<a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung</a>
Here i want to get first of all "#" and as second the Whole a Tag "< a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung"
感谢大家的帮助:)
我正在做类似的事情:
$urls = preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $page, $urls);
这将获取所有 URL,但要获取所有 href,您需要更改与其中一个一起使用的正则表达式以优化您想要的内容。
然后您可以使用 foreach 语句遍历结果:
foreach ($urls as $url){
echo "url: " . $url;
}
我发现了一些有用的东西:
preg_match_all('<a href="(.*)" (.*)>',$text,$match);
Resulted to:
Array
(
[0] => Array
(
[0] => a href="/redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819" torero-modified="link-leading-external">Google
[1] => a href="/admin/home" torero-icon="home">Home
[2] => a href="/admin/pages" torero-icon="pages">Seiten
[3] => a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung
[4] => a href="/admin/accounts/users" torero-icon="person">Benutzer
[5] => a href="/admin/accounts/permissions" torero-icon="check">Rechte
[6] => a href="#" torero-icon="add" torero-left-icon="trending_up">Statistiken
[7] => a href="/admin/statistics/trending" torero-icon="timeline">Beliebte Beiträge
[8] => a href="/admin/statistics/visibility" torero-icon="visibility">SEO Statistiken
[9] => a href="/admin/layouts" torero-icon="view_quilt">Layouts
[10] => a href="#" torero-icon="add" torero-left-icon="settings">Einstellungen
[11] => a href="/admin/settings/profile" torero-icon="person_pin">Profil
[12] => a href="/admin/settings/extensions" torero-icon="extension">Erweiterungen
[13] => a href="/admin/settings/updates" torero-icon="refresh">Software Updates
[14] => a href="/admin/settings/info" torero-icon="info">System Info
[15] => a href="/admin/settings/report" torero-icon="bug_report">Fehler melden
[16] => a href="/admin/settings/feedback" torero-icon="feedback">Feedback geben
[17] => a href="/admin/logout" torero-icon="exit_to_app">Abmelden
)
[1] => Array
(
[0] => /redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819
[1] => /admin/home
[2] => /admin/pages
[3] => #" torero-icon="add
[4] => /admin/accounts/users
[5] => /admin/accounts/permissions
[6] => #" torero-icon="add
[7] => /admin/statistics/trending
[8] => /admin/statistics/visibility
[9] => /admin/layouts
[10] => #" torero-icon="add
[11] => /admin/settings/profile
[12] => /admin/settings/extensions
[13] => /admin/settings/updates
[14] => /admin/settings/info
[15] => /admin/settings/report
[16] => /admin/settings/feedback
[17] => /admin/logout
)
[2] => Array
(
[0] => torero-modified="link-leading-external">Google
[1] => torero-icon="home">Home
[2] => torero-icon="pages">Seiten
[3] => torero-left-icon="accessibility">Account Verwaltung
[4] => torero-icon="person">Benutzer
[5] => torero-icon="check">Rechte
[6] => torero-left-icon="trending_up">Statistiken
[7] => torero-icon="timeline">Beliebte Beiträge
[8] => torero-icon="visibility">SEO Statistiken
[9] => torero-icon="view_quilt">Layouts
[10] => torero-left-icon="settings">Einstellungen
[11] => torero-icon="person_pin">Profil
[12] => torero-icon="extension">Erweiterungen
[13] => torero-icon="refresh">Software Updates
[14] => torero-icon="info">System Info
[15] => torero-icon="bug_report">Fehler melden
[16] => torero-icon="feedback">Feedback geben
[17] => torero-icon="exit_to_app">Abmelden
)
)
如果是简单字符串,则使用strstr
或preg_match_all
。如果您有整个 HTML 文档,请使用 PHP 的内置 DOMDocument。考虑:
$page_html = "<!DOCTYPE html>\n<html>\n...</body>\n</html>";
$doc = \DOMDocument::loadHTML( $page_html );
$anchors = $doc->getElementsByTagName('a');
foreach ( $anchors as $a )
echo "Anchor HREF: " . $a->getAttribute('href') . PHP_EOL;
如果没有适当的标记化,基于字符串的方法将遗漏边缘情况。例如,您想如何处理注释掉的锚点?或者不完全符合您期望的形式的锚怎么样? DOMDocument
解析器应该准确地捕获您想要的内容。
我的问题是,我想从大型 HTML 代码中获取此内容: 所有其他包含 href 标签的标签都不应该可见!
<a href="/admin/home" torero-icon="home">Home</a>
Here i want to get first of all "/admin/home" and as second the Whole a Tag "< a href="/admin/home" torero-icon="home">Home"
<a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung</a>
Here i want to get first of all "#" and as second the Whole a Tag "< a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung"
感谢大家的帮助:)
我正在做类似的事情:
$urls = preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $page, $urls);
这将获取所有 URL,但要获取所有 href,您需要更改与其中一个一起使用的正则表达式以优化您想要的内容。
然后您可以使用 foreach 语句遍历结果:
foreach ($urls as $url){
echo "url: " . $url;
}
我发现了一些有用的东西:
preg_match_all('<a href="(.*)" (.*)>',$text,$match);
Resulted to:
Array
(
[0] => Array
(
[0] => a href="/redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819" torero-modified="link-leading-external">Google
[1] => a href="/admin/home" torero-icon="home">Home
[2] => a href="/admin/pages" torero-icon="pages">Seiten
[3] => a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung
[4] => a href="/admin/accounts/users" torero-icon="person">Benutzer
[5] => a href="/admin/accounts/permissions" torero-icon="check">Rechte
[6] => a href="#" torero-icon="add" torero-left-icon="trending_up">Statistiken
[7] => a href="/admin/statistics/trending" torero-icon="timeline">Beliebte Beiträge
[8] => a href="/admin/statistics/visibility" torero-icon="visibility">SEO Statistiken
[9] => a href="/admin/layouts" torero-icon="view_quilt">Layouts
[10] => a href="#" torero-icon="add" torero-left-icon="settings">Einstellungen
[11] => a href="/admin/settings/profile" torero-icon="person_pin">Profil
[12] => a href="/admin/settings/extensions" torero-icon="extension">Erweiterungen
[13] => a href="/admin/settings/updates" torero-icon="refresh">Software Updates
[14] => a href="/admin/settings/info" torero-icon="info">System Info
[15] => a href="/admin/settings/report" torero-icon="bug_report">Fehler melden
[16] => a href="/admin/settings/feedback" torero-icon="feedback">Feedback geben
[17] => a href="/admin/logout" torero-icon="exit_to_app">Abmelden
)
[1] => Array
(
[0] => /redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819
[1] => /admin/home
[2] => /admin/pages
[3] => #" torero-icon="add
[4] => /admin/accounts/users
[5] => /admin/accounts/permissions
[6] => #" torero-icon="add
[7] => /admin/statistics/trending
[8] => /admin/statistics/visibility
[9] => /admin/layouts
[10] => #" torero-icon="add
[11] => /admin/settings/profile
[12] => /admin/settings/extensions
[13] => /admin/settings/updates
[14] => /admin/settings/info
[15] => /admin/settings/report
[16] => /admin/settings/feedback
[17] => /admin/logout
)
[2] => Array
(
[0] => torero-modified="link-leading-external">Google
[1] => torero-icon="home">Home
[2] => torero-icon="pages">Seiten
[3] => torero-left-icon="accessibility">Account Verwaltung
[4] => torero-icon="person">Benutzer
[5] => torero-icon="check">Rechte
[6] => torero-left-icon="trending_up">Statistiken
[7] => torero-icon="timeline">Beliebte Beiträge
[8] => torero-icon="visibility">SEO Statistiken
[9] => torero-icon="view_quilt">Layouts
[10] => torero-left-icon="settings">Einstellungen
[11] => torero-icon="person_pin">Profil
[12] => torero-icon="extension">Erweiterungen
[13] => torero-icon="refresh">Software Updates
[14] => torero-icon="info">System Info
[15] => torero-icon="bug_report">Fehler melden
[16] => torero-icon="feedback">Feedback geben
[17] => torero-icon="exit_to_app">Abmelden
)
)
如果是简单字符串,则使用strstr
或preg_match_all
。如果您有整个 HTML 文档,请使用 PHP 的内置 DOMDocument。考虑:
$page_html = "<!DOCTYPE html>\n<html>\n...</body>\n</html>";
$doc = \DOMDocument::loadHTML( $page_html );
$anchors = $doc->getElementsByTagName('a');
foreach ( $anchors as $a )
echo "Anchor HREF: " . $a->getAttribute('href') . PHP_EOL;
如果没有适当的标记化,基于字符串的方法将遗漏边缘情况。例如,您想如何处理注释掉的锚点?或者不完全符合您期望的形式的锚怎么样? DOMDocument
解析器应该准确地捕获您想要的内容。