使用 PHP 将 XML 转换为 CSV，在某些项目中使用不同的字段

Question

我想将 XML 转换为 CSV。没关系，但在某些项目中，我的字段比其他项目多或少。

我的供稿示例是：

编辑：XML 的顶部是：

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<item>

字段较少的项目：

<item>
<title>
<![CDATA[
Resident Evil Revelations 2: Raid Mode: Throwback Map Pack
]]>
</title>
<link>
https://www.nuuvem.com/item/resident-evil-revelations-2-raid-mode-throwback-map-pack
</link>
<description>
<![CDATA[
Novas missões do modo raide! 4 mapas nostálgicos de locais icônicos, como o Queen Zenobia, do Resident Evil Revelations. 3 níveis de dificuldade oferecem um total de 12 novas missões.
]]>
</description>
<g:availability>out of stock</g:availability>
<g:price currency="BRL">9.99</g:price>
<g:image_link>
http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/5584854f69702d7235000025/boxshots/j6qaxbrhowfkijd5zdg8.jpg
</g:image_link>
<g:product_type>
<![CDATA[ Action ]]>
</g:product_type>
<g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category>
<g:condition>new</g:condition>
<g:identifier_exists>FALSE</g:identifier_exists>
<g:id>11985</g:id>
</item>

具有最常见字段的项目：

<item>
<title>
<![CDATA[ Tom Clancys Rainbow Six - SIEGE: Gemstone Bundle ]]>
</title>
<link>
https://www.nuuvem.com/bundle/tom-clancy-s-rainbow-six-siege-gemstone-bundle
</link>
<description>
<![CDATA[ ]]>
</description>
<g:availability>in stock</g:availability>
<g:price currency="BRL">38.99</g:price>
<g:image_link>
http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/573ded74f372803be9006b35/boxshots/l8ypqwhq48jzbxogypeh.jpg
</g:image_link>
<g:product_type>
<![CDATA[ Bundle ]]>
</g:product_type>
<g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category>
<g:condition>new</g:condition>
<g:identifier_exists>FALSE</g:identifier_exists>
<g:id>12705</g:id>
</item>

具有更多字段的项目：

<item>
<title>
<![CDATA[ Far Cry 4 - Gold Edition ]]>
</title>
<link>https://www.nuuvem.com/item/far-cry-4-gold-edition</link>
<description>
<![CDATA[
You are a gun for hire, trapped in a war-torn African state, stricken with malaria and forced to make deals with corrupt warlords on both sides of the conflict in order to make this country your home. You must identify and exploit your enemies' weaknesses, neutralizing their superior numbers and firepower.
]]>
</description>
<g:availability>in stock</g:availability>
<g:price currency="BRL">129.99</g:price>
<g:sale_price currency="BRL">64.99</g:sale_price>
<g:sale_price_effective_date>
2017-01-26T02:00:00+00:00/2017-01-31T01:59:00+00:00
</g:sale_price_effective_date>
<g:image_link>
http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/557dbc5369702d0a9c57e600/boxshots/ld6c69odlluoerzmwyga.jpg
</g:image_link>
<g:product_type>
<![CDATA[ Action ]]>
</g:product_type>
<g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category>
<g:condition>new</g:condition>
<g:identifier_exists>FALSE</g:identifier_exists>
<g:id>2246</g:id>
</item>

我正在使用脚本进行转换，它有效，但当项目的字段比最常见的项目类型多或少时，它就会出错。

$filexml='file.xml';
if (file_exists($filexml))  {

   $xml = simplexml_load_file($filexml);
   $i = 1;           // Position counter
   $values = [];     // PHP array

   // Writing column headers
   $columns = array('title', 'link', 'description', 'availability', 'price', 'image_link', 'product_type', 'google_product_category', 'condition', 'identifier_exists', 'id');

   $fs = fopen('nuuvem-merchant.csv', 'w');
   fputcsv($fs, $columns);      
   fclose($fs);

   // Iterate through each <item> node
   $node = $xml->xpath('//item');

   foreach ($node as $n) {           

       // Iterate through each child of <item> node
       $child = $xml->xpath('//item['.$i.']/*');      

       foreach ($child as $value) {
          $values[] = $value;         
       }

       // Write to CSV files (appending to column headers)
       $fs = fopen('nuuvem-merchant.csv', 'a');
       fputcsv($fs, $values);      
       fclose($fs);  

       $values = [];    // Clean out array for next <item> (i.e., row)
       $i++;            // Move to next <item> (i.e., node position)
   }
}

使用这个脚本，我总是得到每个项目字段的顺序。因此，数据多次与列的 header 不匹配，并且一些具有更多字段的项目因此没有 header。

抱歉，我不是 PHP 开发人员，我很难解决这个问题。我试图在这里搜索，但没有找到像我这样的问题。

谢谢

Answer 1

简单地迭代您用于 CSV headers 的 columns 数组，您将列名传递给 XPath 表达式而不是所有项目的 children 与 /*. select 返回的 XPath 数组的第一项使用 [0] 索引并删除空格，使用 trim()。

此外，您需要注册名称空间前缀 g 才能访问这些元素（因此名称空间对于始终包含在已发布的 XML 片段中很重要）：

$filexml = 'GoogleProductFeed.xml';
$xml = simplexml_load_file($filexml);
$xml->registerXPathNamespace('g', 'http://base.google.com/ns/1.0');

if (file_exists($filexml))  {    
   $xml = simplexml_load_file($filexml);
   $i = 1;           // Position counter
   $values = [];     // PHP array

   // Writing column headers
   $columns = array('title', 'link', 'description', 'g:availability', 'g:price', 'g:image_link', 'g:product_type',
                    'g:google_product_category', 'g:condition', 'g:identifier_exists', 'g:id');

   $fs = fopen('GoogleProductFeed.csv', 'w');
   fputcsv($fs, $columns);      
   fclose($fs);

   // Iterate through each <item> node
   $node = $xml->xpath('//item');

   foreach ($node as $n) {               
       // Iterate through each child of <item> node
       foreach ($columns as $col) {         
           if (count($xml->xpath('//item['.$i.']/'.$col)) > 0) {
              $values[] = trim($xml->xpath('//item['.$i.']/'.$col)[0]);
           } else {
              $values[] = '';
           }    
       }    
       // Write to CSV files (appending to column headers)
       $fs = fopen('GoogleProductFeed.csv', 'a');
       fputcsv($fs, $values);      
       fclose($fs);  

       $values = [];    // Clean out array for next <item> (i.e., row)
       $i++;            // Move to next <item> (i.e., node position)
   }
}

请注意，headers 列对于那些适用的列将具有 g: 前缀。也许使用两个几乎相同的数组，一个用于 headers，另一个用于 XPath 调用。

使用 PHP 将 XML 转换为 CSV，在某些项目中使用不同的字段

Convert XML to CSV with PHP, with different fields in some itens

php

xml

csv

data-conversion

google-shopping