我想找到所有 A Href 并获得他们的链接和完整内容

I want to find all A Href and get their Links and their full Content

我的问题是,我想从大型 HTML 代码中获取此内容: 所有其他包含 href 标签的标签都不应该可见!

<a href="/admin/home" torero-icon="home">Home</a>

Here i want to get first of all "/admin/home" and as second the Whole a Tag "< a href="/admin/home" torero-icon="home">Home"

<a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung</a>

Here i want to get first of all "#" and as second the Whole a Tag "< a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung"

感谢大家的帮助:)

我正在做类似的事情:

$urls = preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $page, $urls);

这将获取所有 URL,但要获取所有 href,您需要更改与其中一个一起使用的正则表达式以优化您想要的内容。

然后您可以使用 foreach 语句遍历结果:

foreach ($urls as $url){
    echo "url: " . $url;
}

我发现了一些有用的东西:

 preg_match_all('<a href="(.*)" (.*)>',$text,$match);

Resulted to:

Array
 (
  [0] => Array
    (
        [0] => a href="/redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819" torero-modified="link-leading-external">Google
        [1] => a href="/admin/home" torero-icon="home">Home
        [2] => a href="/admin/pages" torero-icon="pages">Seiten
        [3] => a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung
        [4] => a href="/admin/accounts/users" torero-icon="person">Benutzer
        [5] => a href="/admin/accounts/permissions" torero-icon="check">Rechte
        [6] => a href="#" torero-icon="add" torero-left-icon="trending_up">Statistiken
        [7] => a href="/admin/statistics/trending" torero-icon="timeline">Beliebte Beiträge
        [8] => a href="/admin/statistics/visibility" torero-icon="visibility">SEO Statistiken
        [9] => a href="/admin/layouts" torero-icon="view_quilt">Layouts
        [10] => a href="#" torero-icon="add" torero-left-icon="settings">Einstellungen
        [11] => a href="/admin/settings/profile" torero-icon="person_pin">Profil
        [12] => a href="/admin/settings/extensions" torero-icon="extension">Erweiterungen
        [13] => a href="/admin/settings/updates" torero-icon="refresh">Software Updates
        [14] => a href="/admin/settings/info" torero-icon="info">System Info
        [15] => a href="/admin/settings/report" torero-icon="bug_report">Fehler melden
        [16] => a href="/admin/settings/feedback" torero-icon="feedback">Feedback geben
        [17] => a href="/admin/logout" torero-icon="exit_to_app">Abmelden
    )

[1] => Array
    (
        [0] => /redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819
        [1] => /admin/home
        [2] => /admin/pages
        [3] => #" torero-icon="add
        [4] => /admin/accounts/users
        [5] => /admin/accounts/permissions
        [6] => #" torero-icon="add
        [7] => /admin/statistics/trending
        [8] => /admin/statistics/visibility
        [9] => /admin/layouts
        [10] => #" torero-icon="add
        [11] => /admin/settings/profile
        [12] => /admin/settings/extensions
        [13] => /admin/settings/updates
        [14] => /admin/settings/info
        [15] => /admin/settings/report
        [16] => /admin/settings/feedback
        [17] => /admin/logout
    )

[2] => Array
    (
        [0] => torero-modified="link-leading-external">Google
        [1] => torero-icon="home">Home
        [2] => torero-icon="pages">Seiten
        [3] => torero-left-icon="accessibility">Account Verwaltung
        [4] => torero-icon="person">Benutzer
        [5] => torero-icon="check">Rechte
        [6] => torero-left-icon="trending_up">Statistiken
        [7] => torero-icon="timeline">Beliebte Beiträge
        [8] => torero-icon="visibility">SEO Statistiken
        [9] => torero-icon="view_quilt">Layouts
        [10] => torero-left-icon="settings">Einstellungen
        [11] => torero-icon="person_pin">Profil
        [12] => torero-icon="extension">Erweiterungen
        [13] => torero-icon="refresh">Software Updates
        [14] => torero-icon="info">System Info
        [15] => torero-icon="bug_report">Fehler melden
        [16] => torero-icon="feedback">Feedback geben
        [17] => torero-icon="exit_to_app">Abmelden
    )

)

如果是简单字符串,则使用strstrpreg_match_all。如果您有整个 HTML 文档,请使用 PHP 的内置 DOMDocument。考虑:

$page_html = "<!DOCTYPE html>\n<html>\n...</body>\n</html>";
$doc = \DOMDocument::loadHTML( $page_html );

$anchors = $doc->getElementsByTagName('a');
foreach ( $anchors as $a )
    echo "Anchor HREF: " . $a->getAttribute('href') . PHP_EOL;

如果没有适当的标记化,基于字符串的方法将遗漏边缘情况。例如,您想如何处理注释掉的锚点?或者不完全符合您期望的形式的锚怎么样? DOMDocument 解析器应该准确地捕获您想要的内容。