解析维基百科 {{Location map}} 模板

Parse wikipedia {{Location map}} templates

我想解析包含 {{Location map}} 模板的维基百科电厂列表。在我的示例中,我使用的是德语翻译,但这不应该改变基本过程。

如何从这样的代码中取出 label=、lat=、lon= 和 region= 参数? 对于像 BeautifulSoup 这样的 html 解析器来说,这可能不是什么,而是 awk?

{{ Positionskarte+
 | Tadschikistan
 | maptype     = relief
 | width       = 600
 | float       = right
 | caption     =
 | places      =
 {{ Positionskarte~
  | Tadschikistan
  | label      = <small>[[Talsperre Baipasa|Baipasa]]</small>
  | marktarget =
  | mark       = Blue pog.svg
  | position   = right
  | lat        = 38.267584
  | long       = 69.123906
  | region     = TJ
  | background = #FEFEE9
 }}
 {{ Positionskarte~
  | Tadschikistan
  | label      = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
  | marktarget =
  | mark       = Red pog.svg
  | position   = left
  | lat        = 38.5565
  | long       = 68.776
  | region     = TJ
  | background = #FEFEE9
 }}
...
}}

提前致谢!

只需使用正则表达式提取信息。 例如像这样 (PHP)

$k = "{{ Positionskarte+
 | Tadschikistan
 | maptype     = relief
 | width       = 600
 | float       = right
 | caption     =
 | places      =
 {{ Positionskarte~
  | Tadschikistan
  | label      = <small>[[Talsperre Baipasa|Baipasa]]</small>
  | marktarget =
  | mark       = Blue pog.svg
  | position   = right
  | lat        = 38.267584
  | long       = 69.123906
  | region     = TJ
  | background = #FEFEE9
 }}
 {{ Positionskarte~
  | Tadschikistan
  | label      = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
  | marktarget =
  | mark       = Red pog.svg
  | position   = left
  | lat        = 38.5565
  | long       = 68.776
  | region     = TJ
  | background = #FEFEE9
 }}
}}";

$items = explode("Positionskarte~", $k);

$result = [];

foreach ($items as $item) {
    $info = [];
    $pattern1 = '/label\s+=\s+(.+)/';
    preg_match($pattern1, $item, $matches);
    if (!empty($matches)) {
        $info['label'] = $matches[1];       
    }
    $pattern2 = '/lat\s+=\s+(.+)/';
    preg_match($pattern2, $item, $matches);
    if (!empty($matches)) {
        $info['lat'] = $matches[1];     
    }
    $pattern3 = '/long\s+=\s+(.+)/';
    preg_match($pattern3, $item, $matches);
    if (!empty($matches)) {
    $info['long'] = $matches[1];        
    }

    $pattern4 = '/region\s+=\s+(.+)/';
    preg_match($pattern4, $item, $matches);
    if (!empty($matches)) {
        $info['region'] = $matches[1];      
    }

    if(!empty($info)) {
        $result[] = $info;
    }
}

var_dump($result);