简单 HTML DOM 解析器 - 在 foreach 循环中显示变量时出现问题
Simple HTML DOM Parser - Problem displaying a variable inside foreach loop
我使用 simple_html_dom.php class 创建了一个简单的 PHP 脚本。我从一个网站上获取一些关于电影的信息。我在另一个 foreach 循环中有一个 foreach 循环。当我尝试在 foreach 循环中显示电影名称时,我得到了最后一个电影名称。我想要实现的是在每个项目中获取每个唯一的电影名称。问题出在 $movie 变量上。
(当我在第 27 行回显 $movie var 时,我得到了正确的结果,但我想在第 33 行的 youtube 链接中包含每个电影名称……)
<?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
foreach($html->find('tr.header') as $header) {
$header->outertext = '';
}
foreach($html->find('td') as $bottom) {
if ($bottom->colspan == '9') {
$bottom->outertext = '';
}
}
foreach($html->find('td.vertTh') as $vert) {
$vert->outertext = '';
}
foreach($html->find("div.detName") as $movie) {
$movie = $movie->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
foreach($html->find('img') as $img) {
if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
}
$html->save();
foreach($html->find("table") as $title) {
echo $title->outertext . '<br>';
}
?>
原始来源:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
现在怎么样:
替换 IMG 元素的 HTML 代码和问题是所有元素的链接都是相同的,当它们对于每个元素(如 MovieTitles)应该是唯一的时:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>
<a href="https://www.youtube.com/results?search_query= The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26 " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
您想要的图像嵌套在 detName
DIV 的兄弟图像之一中。所以你可以在父元素中搜索。
由于 find()
允许更复杂的 CSS 选择器,您可以专门搜索所需的图像,而不是遍历所有图像。
foreach($html->find("div.detName") as $movieDiv) {
$movie = $movieDiv->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
$img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
if ($img) {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
理想情况下,您应该只着眼于提取数据(如果需要则更改它),然后从中构建您的 table。
?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);
function remove_junk($movie_name) {
// you get the idea.. maybe a db or further stripping
return str_replace([
'WEB-DL.X26',
'GalaxyRG',
'.1080p.WEB-DL.X26',
'0.HDRip.XviD.AC3-EVO[TGx]',
'.720p.BluRay.800MB.x264-'
], '', $movie_name);
}
$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
//
$td = $tr->find('td');
// buggy simple_html_dom doesn't see tbody
if ($tr->parent->tag === 'table' && isset($td[1])) {
$td = $tr->find('td');
$name = trim($td[1]->find('.detName', 0)->plaintext);
$links = [];
foreach ($td[1]->find('a') as $link) {
$links[] = $link->href;
}
$info = $td[1]->find('.detDesc', 0)->plaintext;
$info = explode(', ', $info);
$uploaded = trim(str_replace(['Uploaded', ' '], ' ', $info[0]));
$size = trim(str_replace(['Size', ' '], ' ', $info[1]));
$ULed = trim(str_replace(['ULed by'], ' ', $info[2]));
$movies[] = [
'name' => $name,
'links' => [
'site' => $links[0],
'magnet' => $links[1],
'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
],
'uploaded' => $uploaded,
'size' => $size,
'ULed' => [
'user' => $ULed,
'link' => $links[3]
],
'seeds' => trim($td[2]->plaintext),
'leecher' => trim($td[3]->plaintext)
];
}
}
print_r($movies);
将产生以下结构的数组。
Array (
... snip
[30] => Array
(
[name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[links] => Array
(
[site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
[youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
)
[uploaded] => 12-07 01:51
[size] => 798.15 MiB
[ULed] => Array
(
[user] => sotnikam
[link] => https://tpb.party/user/sotnikam/
)
[seeds] => 351
[leecher] => 57
)
)
然后您可以循环构建您自己的样式 table,包括 youtube link.. 虽然最好在任务中抓取所有内容以将结果数据放入数据库,然后改为执行查询,这样您就可以存储它们,这样您就不会在每次请求时都抓取网站,并且可以在显示损坏的网站之前检测源是否发生变化。
我使用 simple_html_dom.php class 创建了一个简单的 PHP 脚本。我从一个网站上获取一些关于电影的信息。我在另一个 foreach 循环中有一个 foreach 循环。当我尝试在 foreach 循环中显示电影名称时,我得到了最后一个电影名称。我想要实现的是在每个项目中获取每个唯一的电影名称。问题出在 $movie 变量上。
(当我在第 27 行回显 $movie var 时,我得到了正确的结果,但我想在第 33 行的 youtube 链接中包含每个电影名称……)
<?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
foreach($html->find('tr.header') as $header) {
$header->outertext = '';
}
foreach($html->find('td') as $bottom) {
if ($bottom->colspan == '9') {
$bottom->outertext = '';
}
}
foreach($html->find('td.vertTh') as $vert) {
$vert->outertext = '';
}
foreach($html->find("div.detName") as $movie) {
$movie = $movie->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
foreach($html->find('img') as $img) {
if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
}
$html->save();
foreach($html->find("table") as $title) {
echo $title->outertext . '<br>';
}
?>
原始来源:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
现在怎么样:
替换 IMG 元素的 HTML 代码和问题是所有元素的链接都是相同的,当它们对于每个元素(如 MovieTitles)应该是唯一的时:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>
<a href="https://www.youtube.com/results?search_query= The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26 " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
您想要的图像嵌套在 detName
DIV 的兄弟图像之一中。所以你可以在父元素中搜索。
由于 find()
允许更复杂的 CSS 选择器,您可以专门搜索所需的图像,而不是遍历所有图像。
foreach($html->find("div.detName") as $movieDiv) {
$movie = $movieDiv->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
$img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
if ($img) {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
理想情况下,您应该只着眼于提取数据(如果需要则更改它),然后从中构建您的 table。
?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);
function remove_junk($movie_name) {
// you get the idea.. maybe a db or further stripping
return str_replace([
'WEB-DL.X26',
'GalaxyRG',
'.1080p.WEB-DL.X26',
'0.HDRip.XviD.AC3-EVO[TGx]',
'.720p.BluRay.800MB.x264-'
], '', $movie_name);
}
$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
//
$td = $tr->find('td');
// buggy simple_html_dom doesn't see tbody
if ($tr->parent->tag === 'table' && isset($td[1])) {
$td = $tr->find('td');
$name = trim($td[1]->find('.detName', 0)->plaintext);
$links = [];
foreach ($td[1]->find('a') as $link) {
$links[] = $link->href;
}
$info = $td[1]->find('.detDesc', 0)->plaintext;
$info = explode(', ', $info);
$uploaded = trim(str_replace(['Uploaded', ' '], ' ', $info[0]));
$size = trim(str_replace(['Size', ' '], ' ', $info[1]));
$ULed = trim(str_replace(['ULed by'], ' ', $info[2]));
$movies[] = [
'name' => $name,
'links' => [
'site' => $links[0],
'magnet' => $links[1],
'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
],
'uploaded' => $uploaded,
'size' => $size,
'ULed' => [
'user' => $ULed,
'link' => $links[3]
],
'seeds' => trim($td[2]->plaintext),
'leecher' => trim($td[3]->plaintext)
];
}
}
print_r($movies);
将产生以下结构的数组。
Array (
... snip
[30] => Array
(
[name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[links] => Array
(
[site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
[youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
)
[uploaded] => 12-07 01:51
[size] => 798.15 MiB
[ULed] => Array
(
[user] => sotnikam
[link] => https://tpb.party/user/sotnikam/
)
[seeds] => 351
[leecher] => 57
)
)
然后您可以循环构建您自己的样式 table,包括 youtube link.. 虽然最好在任务中抓取所有内容以将结果数据放入数据库,然后改为执行查询,这样您就可以存储它们,这样您就不会在每次请求时都抓取网站,并且可以在显示损坏的网站之前检测源是否发生变化。