简单 HTML DOM 解析器 - 在 foreach 循环中显示变量时出现问题

Simple HTML DOM Parser - Problem displaying a variable inside foreach loop

我使用 simple_html_dom.php class 创建了一个简单的 PHP 脚本。我从一个网站上获取一些关于电影的信息。我在另一个 foreach 循环中有一个 foreach 循环。当我尝试在 foreach 循环中显示电影名称时,我得到了最后一个电影名称。我想要实现的是在每个项目中获取每个唯一的电影名称。问题出在 $movie 变量上。

(当我在第 27 行回显 $movie var 时,我得到了正确的结果,但我想在第 33 行的 youtube 链接中包含每个电影名称……)

<?php
include("simple_html_dom.php");
    
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
    
foreach($html->find('tr.header') as $header) {
    $header->outertext = '';
}
        
foreach($html->find('td') as $bottom) {
    if ($bottom->colspan == '9') {
        $bottom->outertext = '';
    }
}
        
foreach($html->find('td.vertTh') as $vert) {
    $vert->outertext = '';
}   
    
foreach($html->find("div.detName") as $movie) {
    $movie = $movie->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    foreach($html->find('img') as $img) {
    
        if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
            $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
        }
    }
}   
    
$html->save();
foreach($html->find("table") as $title) {
    echo $title->outertext . '<br>';
}
?>

原始来源:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

现在怎么样:

替换 IMG 元素的 HTML 代码和问题是所有元素的链接都是相同的,当它们对于每个元素(如 MovieTitles)应该是唯一的时:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>&nbsp;&nbsp;
  <a href="https://www.youtube.com/results?search_query=            The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26  " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

您想要的图像嵌套在 detName DIV 的兄弟图像之一中。所以你可以在父元素中搜索。

由于 find() 允许更复杂的 CSS 选择器,您可以专门搜索所需的图像,而不是遍历所有图像。

foreach($html->find("div.detName") as $movieDiv) {
    $movie = $movieDiv->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    $img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
    if ($img) {
        $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
    }
}

理想情况下,您应该只着眼于提取数据(如果需要则更改它),然后从中构建您的 table。

?php
include("simple_html_dom.php");

$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);

function remove_junk($movie_name) {
    // you get the idea.. maybe a db or further stripping
    return str_replace([
        'WEB-DL.X26',
        'GalaxyRG',
        '.1080p.WEB-DL.X26', 
        '0.HDRip.XviD.AC3-EVO[TGx]',
        '.720p.BluRay.800MB.x264-'
    ], '', $movie_name);
}

$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
    //
    $td = $tr->find('td');

    // buggy simple_html_dom doesn't see tbody
    if ($tr->parent->tag === 'table' && isset($td[1])) {
        $td = $tr->find('td');

        $name = trim($td[1]->find('.detName', 0)->plaintext);

        $links = [];
        foreach ($td[1]->find('a') as $link) {
            $links[] = $link->href;
        }

        $info = $td[1]->find('.detDesc', 0)->plaintext;
        $info = explode(', ', $info);

        $uploaded = trim(str_replace(['Uploaded', '&nbsp;'], ' ', $info[0]));
        $size = trim(str_replace(['Size', '&nbsp;'], ' ', $info[1]));
        $ULed = trim(str_replace(['ULed by'], ' ', $info[2]));

        $movies[] = [
            'name' => $name,
            'links' => [
                'site' => $links[0],
                'magnet' => $links[1],
                'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
            ],
            'uploaded' => $uploaded,
            'size' => $size,
            'ULed' => [
                'user' => $ULed,
                'link' => $links[3]
            ],
            'seeds' => trim($td[2]->plaintext),
            'leecher' => trim($td[3]->plaintext)
        ];
    }
}  

print_r($movies);

将产生以下结构的数组。

Array (
    ... snip
    [30] => Array
        (
            [name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
            [links] => Array
                (
                    [site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
                    [magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
                    [youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
                )

            [uploaded] => 12-07 01:51
            [size] => 798.15 MiB
            [ULed] => Array
                (
                    [user] => sotnikam
                    [link] => https://tpb.party/user/sotnikam/
                )

            [seeds] => 351
            [leecher] => 57
        )

)

然后您可以循环构建您自己的样式 table,包括 youtube link.. 虽然最好在任务中抓取所有内容以将结果数据放入数据库,然后改为执行查询,这样您就可以存储它们,这样您就不会在每次请求时都抓取网站,并且可以在显示损坏的网站之前检测源是否发生变化。