简单 html dom 奇怪的行为

Simple html dom weird behavior


我正在尝试将一个在 table 和 运行 中显示日历事件的网站解析为一些 st运行ge 行为。

html结构:

-----------------------------
date 1| - 1st event this date
      | - 2nd event this date
-----------------------------
date 2| - 1st event this date
      | - 2nd event this date
-----------------------------
date 3| - 1st event this date
-----------------------------
date 4| - 1st event this date
-----------------------------

如您所见,它基本上是一个 <table>,其中每个 <tr> 代表一个日期:


我尝试过的:

我试过使用 simple_html_dom.php:

用 php 解析它
foreach($html ->find('#jevents_body table.ev_table tbody tr') as $tr){

    $dateEl = $tr ->find("td.ev_td_left text", 0);
    $eventDate = parseDate($dateEl ->plaintext);

    // Iterate through all events this date
    foreach($tr ->find('li.ev_td_li') as $li) {

        // Get the event title
        $title = ($li ->find('a.ev_link_row', 0))  ->plaintext;
        print("Parsed: [$title, $eventDate]\r\n");
    }
}


问题:

它似乎以某种方式将整个页面解析了两次。我的输出看起来有点像:

Parsed: [1st event this date, date 1]
Parsed: [2nd event this date, date 1]
Parsed: [1st event this date, date 2]
Parsed: [2nd event this date, date 2]
Parsed: [1st event this date, date 3]
Parsed: [1st event this date, date 4]

//and here it runs again...
Parsed: [1st event this date, date 1]
Parsed: [2nd event this date, date 1]
Parsed: [1st event this date, date 2]
Parsed: [2nd event this date, date 2]
Parsed: [1st event this date, date 3]
Parsed: [1st event this date, date 4]

有人知道问题出在哪里吗?


编辑 1:标记:

按照建议,这是 html 标记。 (真是乱七八糟): http://www.akg-bensheim.de/termine/range.listevents/-

这会产生以下输出:

Parsed: [Vorstand des Fördervereins, 2015-04-29]
Parsed: [Beginn der sportpraktischen Abiturprüfungen, 2015-04-29]
Parsed: [Christi Himmelfahrt, 2015-04-29]
Parsed: [Brückentag / beweglicher Ferientag, 2015-04-29]
Parsed: [Pfingstmontag, 2015-04-29]
Parsed: [Bundesjugendspiele, 2015-04-29]
Parsed: [Unterrichtsfrei wegen mündl. Abitur, 2015-04-29]
Parsed: [Mündliche Abiturprüfungen, 2015-04-29]
Parsed: [Fronleichnam, 2015-04-29]
Parsed: [Brückentag / beweglicher Ferientag, 2015-04-29]
Parsed: [Pensionäre: Sommerstammtisch, 2015-04-29]
Parsed: [Abiturienten-Gottesdienst, 2015-04-29]
Parsed: [Akademische Abitur-Feier, 2015-04-29]
Parsed: [Abi-Ball, 2015-04-29]
Parsed: [Sommerferien, 2015-04-29]
Parsed: [Vorstand des Fördervereins, 2015-04-29]
Parsed: [Beginn der sportpraktischen Abiturprüfungen, 2015-05-04]
Parsed: [Christi Himmelfahrt, 2015-05-14]
Parsed: [Brückentag / beweglicher Ferientag, 2015-05-15]
Parsed: [Pfingstmontag, 2015-05-25]
Parsed: [Bundesjugendspiele, 2015-05-28]
Parsed: [Unterrichtsfrei wegen mündl. Abitur, 2015-05-29]
Parsed: [Mündliche Abiturprüfungen, 2015-05-29]
Parsed: [Fronleichnam, 2015-06-04]
Parsed: [Brückentag / beweglicher Ferientag, 2015-06-05]
Parsed: [Pensionäre: Sommerstammtisch, 2015-06-09]
Parsed: [Abiturienten-Gottesdienst, 2015-06-24]
Parsed: [Akademische Abitur-Feier, 2015-06-25]
Parsed: [Abi-Ball, 2015-06-27]
Parsed: [Sommerferien, 2015-07-27]

如您所见,它以某种方式将整个内容解析了两次!

好吧,我已经找到了解决这个问题的方法,尽管我仍然不知道为什么解析器表现得如此奇怪。

我基本上检查了每个 table 行的 plaintext 属性,如果它有空文本则跳转到下一个循环:

foreach($html ->find('#jevents_body table.ev_table tbody tr') as $tr) {
    $tmp = trim($tr ->plaintext);
    if(empty($tmp)) {
        continue;
    }

   //Parsing
   ...
}