解析维基百科 {{Location map}} 模板
Parse wikipedia {{Location map}} templates
我想解析包含 {{Location map}} 模板的维基百科电厂列表。在我的示例中,我使用的是德语翻译,但这不应该改变基本过程。
如何从这样的代码中取出 label=、lat=、lon= 和 region= 参数?
对于像 BeautifulSoup 这样的 html 解析器来说,这可能不是什么,而是 awk?
{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
...
}}
提前致谢!
只需使用正则表达式提取信息。
例如像这样 (PHP
)
$k = "{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
}}";
$items = explode("Positionskarte~", $k);
$result = [];
foreach ($items as $item) {
$info = [];
$pattern1 = '/label\s+=\s+(.+)/';
preg_match($pattern1, $item, $matches);
if (!empty($matches)) {
$info['label'] = $matches[1];
}
$pattern2 = '/lat\s+=\s+(.+)/';
preg_match($pattern2, $item, $matches);
if (!empty($matches)) {
$info['lat'] = $matches[1];
}
$pattern3 = '/long\s+=\s+(.+)/';
preg_match($pattern3, $item, $matches);
if (!empty($matches)) {
$info['long'] = $matches[1];
}
$pattern4 = '/region\s+=\s+(.+)/';
preg_match($pattern4, $item, $matches);
if (!empty($matches)) {
$info['region'] = $matches[1];
}
if(!empty($info)) {
$result[] = $info;
}
}
var_dump($result);
我想解析包含 {{Location map}} 模板的维基百科电厂列表。在我的示例中,我使用的是德语翻译,但这不应该改变基本过程。
如何从这样的代码中取出 label=、lat=、lon= 和 region= 参数? 对于像 BeautifulSoup 这样的 html 解析器来说,这可能不是什么,而是 awk?
{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
...
}}
提前致谢!
只需使用正则表达式提取信息。
例如像这样 (PHP
)
$k = "{{ Positionskarte+
| Tadschikistan
| maptype = relief
| width = 600
| float = right
| caption =
| places =
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Talsperre Baipasa|Baipasa]]</small>
| marktarget =
| mark = Blue pog.svg
| position = right
| lat = 38.267584
| long = 69.123906
| region = TJ
| background = #FEFEE9
}}
{{ Positionskarte~
| Tadschikistan
| label = <small>[[Kraftwerk Duschanbe|Duschanbe]]</small>
| marktarget =
| mark = Red pog.svg
| position = left
| lat = 38.5565
| long = 68.776
| region = TJ
| background = #FEFEE9
}}
}}";
$items = explode("Positionskarte~", $k);
$result = [];
foreach ($items as $item) {
$info = [];
$pattern1 = '/label\s+=\s+(.+)/';
preg_match($pattern1, $item, $matches);
if (!empty($matches)) {
$info['label'] = $matches[1];
}
$pattern2 = '/lat\s+=\s+(.+)/';
preg_match($pattern2, $item, $matches);
if (!empty($matches)) {
$info['lat'] = $matches[1];
}
$pattern3 = '/long\s+=\s+(.+)/';
preg_match($pattern3, $item, $matches);
if (!empty($matches)) {
$info['long'] = $matches[1];
}
$pattern4 = '/region\s+=\s+(.+)/';
preg_match($pattern4, $item, $matches);
if (!empty($matches)) {
$info['region'] = $matches[1];
}
if(!empty($info)) {
$result[] = $info;
}
}
var_dump($result);