使用 PHP 更快地处理 xml 文件

Faster way to process xml file using PHP

我有一个名为 flight-itinerary.xml 的 xml 文件。缩小版如下所示。

<itin line="1" dep="LOS" arr="ABV">
    <flt>
        <fltav>
            <cb>1</cb>
            <id>C</id>
            <av>10</av>
            <cur>NGN</cur>
            <CurInf>2,0.01,0.01</CurInf>
            <pri>15000.00</pri>
            <tax>30800.00</tax>
            <fav>1</fav>
            <miles></miles>
            <fid>11</fid>
            <finf>0,0,1</finf>

            <cb>2</cb>
            <id>J</id>
            <av>10</av>
            <cur>NGN</cur>
            <CurInf>2,0.01,0.01</CurInf>
            <pri>13000.00</pri>
            <tax>26110.00</tax>
            <fav>1</fav>
            <miles></miles>
            <fid>12</fid>
            <finf>0,0,0</finf>
        </fltav>
    </flt>
</itin>

完整文件包含 8 个行程 <itin> 要素。每个 <itin> 元素的 <fltav> 元素包含 <cb>1</cb><finf>0,0,1</finf> 组中的 11 个。

下面是我用来处理文件的代码:

<?php

function processFlightsData()
{
    $data = array();
    $dom= new DOMDocument();
    $dom->load('flight-itinerary.xml');

    $classbands  = $dom->getElementsByTagName('classbands')->item(0);
    $bands       = $classbands->getElementsByTagName('band');
    $itineraries = $dom->getElementsByTagName('itin');
    $counter     = 0;

    foreach($itineraries AS $itinerary)
    { 
        $flt = $itinerary->getElementsByTagName('flt')->item(0);

        $dep = $flt->getElementsByTagName('dep')->item(0)->nodeValue;
        $arr = $flt->getElementsByTagName('arr')->item(0)->nodeValue;

        $time_data       = $flt->getElementsByTagName('time')->item(0);
        $departure_day   = $time_data->getElementsByTagName('ddaylcl')->item(0)->nodeValue;
        $departure_time  = $time_data->getElementsByTagName('dtimlcl')->item(0)->nodeValue;
        $departure_date  = $departure_day. ' '. $departure_time;
        $arrival_day     = $time_data->getElementsByTagName('adaylcl')->item(0)->nodeValue;
        $arrival_time    = $time_data->getElementsByTagName('atimlcl')->item(0)->nodeValue;
        $arrival_date    = $arrival_day. ' '. $arrival_time;
        $flight_duration = $time_data->getElementsByTagName('duration')->item(0)->nodeValue;

        $flt_det       = $flt->getElementsByTagName('fltdet')->item(0);
        $airline_id    = $flt_det->getElementsByTagName('airid')->item(0)->nodeValue;
        $flt_no        = $flt_det->getElementsByTagName('fltno')->item(0)->nodeValue;
        $flight_number = $airline_id. $flt_no;
        $airline_type  = $flt_det->getElementsByTagName('eqp')->item(0)->nodeValue;
        $stops         = $flt_det->getElementsByTagName('stp')->item(0)->nodeValue;

        $av_data = $flt->getElementsByTagName('fltav')->item(0);

        $cbs     = iterator_to_array($av_data->getElementsByTagName('cb')); //11 entries
        $ids     = iterator_to_array($av_data->getElementsByTagName('id')); //ditto
        $seats   = iterator_to_array($av_data->getElementsByTagName('av')); //ditto
        $curr    = iterator_to_array($av_data->getElementsByTagName('cur')); //ditto
        $price   = iterator_to_array($av_data->getElementsByTagName('pri')); //ditto
        $tax     = iterator_to_array($av_data->getElementsByTagName('tax')); //ditto
        $miles   = iterator_to_array($av_data->getElementsByTagName('miles')); //ditto
        $fid     = iterator_to_array($av_data->getElementsByTagName('fid')); //ditto    

        $inner_counter = 0;

        for($i = 0; $i < count($ids); $i++)
        {
            $data[$counter][$inner_counter] = array
            (
                'flight_number'                   => $flight_number,
                'flight_duration'                 => $flight_duration, 
                'departure_date'                  => $departure_date,
                'departure_time'                  => substr($departure_time, 0, 5),
                'arrival_date'                    => $arrival_date,
                'arrival_time'                    => substr($arrival_time, 0, 5),
                'departure_airport_code'          => $dep,
                'departure_airport_location_name' => get_airport_data($dep, $data_key='location'),
                'arrival_airport_code'            => $arr,
                'arrival_airport_location_name'   => get_airport_data($arr, $data_key='location'),
                'stops'                           => $stops,
                'cabin_class'                     => $ids[$i]->nodeValue,
                'ticket_class'                    => $ids[$i]->nodeValue,
                'ticket_class_nicename'           => formate_ticket_class_name($ids[$i]->nodeValue),
                'available_seats'                 => $seats[$i]->nodeValue,
                'currency'                        => $curr[$i]->nodeValue,
                'price'                           => $price[$i]->nodeValue,
                'tax'                             => $tax[$i]->nodeValue,
                'miles'                           => $miles[$i]->nodeValue,
            );

            ++$inner_counter;
        }

    return $data;
}

?>

现在,外循环为每个<itin>元素迭代8次,而在外循环的每次迭代中,内循环迭代11次,导致每遍总共迭代88次,造成严重性能问题。我正在寻找的是一种处理文件的更快方法。任何帮助将不胜感激。

我不认为循环是瓶颈。您应该检查在循环中调用的操作,get_airport_dataformate_ticket_class_name.

在多个 itin 元素上尝试您的代码(没有辅助操作)花费不到一秒钟,检查此 fiddle:http://phpfiddle.org/main/code/7fpi-b3ka(请注意 XML 可能与你的不相似,我猜到很多元素都缺失了)。

如果调用的操作大大增加了处理时间,请尝试使用批量数据调用操作或缓存响应。