如何使用 python 从 wiki 获取特定语言的文章?
How can I get an article from wiki with a specific language using python?
我想在 wiki 中获取特定语言的文章。
我尝试了以下代码:
URL = "https://en.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"titles": "Python",
"prop": "langlinks",
"lllang": "de",
"format": "json"
}
results = requests.get(url=URL, params=PARAMS)
soup = BeautifulSoup(results.content, 'html.parser')
print(soup.prettify())
但我没有得到整篇文章我 git 就是这个
{"batchcomplete":"","query":{"pages":{"46332325":{"pageid":46332325,"ns":0,"title":"Python","langlinks":[{"lang":"de","*":"Python"}]}}}}
你能帮助理解我做错了什么吗?
将 URL 更改为 de.wikipedia.org
以获得德语版本。
例如:
import requests
from bs4 import BeautifulSoup
URL = "https://de.wikipedia.org/w/api.php" # <-- note the de.
PARAMS = {
"action": "parse",
"page": "Python (Programmiersprache)",
"prop": "text",
"section": 0,
"format": "json"
}
results = requests.get(url=URL, params=PARAMS).json()
soup = BeautifulSoup(results['parse']['text']['*'], 'html.parser')
print(soup.prettify())
打印:
<div class="mw-parser-output">
<table cellspacing="5" class="float-right infobox toccolours toptextcells" style="font-size:90%; margin-top:0; width:21em;">
<tbody>
<tr>
<th class="hintergrundfarbe6" colspan="2" style="font-size:larger;">
Python
</th>
</tr>
<tr>
... and so on.
要仅获取 wiki template/tags,您可以这样做:
URL = "https://de.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"titles": "Python (Programmiersprache)",
"prop": "revisions",
"rvprop": "content",
"rvsection": 0,
"format": "json"
}
results = requests.get(url=URL, params=PARAMS).json()
print(results)
如果您有一种语言的维基百科页面标题,而您想知道另一种语言的标题,您可以使用 "langlinks" 属性,如下所示:
注意 "lllang" 设置为 "de"
这给你:
{
"batchcomplete": "",
"query": {
"pages": {
"23862": {
"pageid": 23862,
"ns": 0,
"title": "Python (programming language)",
"langlinks": [
{
"lang": "de",
"*": "Python (Programmiersprache)"
}
]
}
}
}
}
查看此处了解更多信息:
https://www.mediawiki.org/wiki/API:Langlinks
我想在 wiki 中获取特定语言的文章。
我尝试了以下代码:
URL = "https://en.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"titles": "Python",
"prop": "langlinks",
"lllang": "de",
"format": "json"
}
results = requests.get(url=URL, params=PARAMS)
soup = BeautifulSoup(results.content, 'html.parser')
print(soup.prettify())
但我没有得到整篇文章我 git 就是这个
{"batchcomplete":"","query":{"pages":{"46332325":{"pageid":46332325,"ns":0,"title":"Python","langlinks":[{"lang":"de","*":"Python"}]}}}}
你能帮助理解我做错了什么吗?
将 URL 更改为 de.wikipedia.org
以获得德语版本。
例如:
import requests
from bs4 import BeautifulSoup
URL = "https://de.wikipedia.org/w/api.php" # <-- note the de.
PARAMS = {
"action": "parse",
"page": "Python (Programmiersprache)",
"prop": "text",
"section": 0,
"format": "json"
}
results = requests.get(url=URL, params=PARAMS).json()
soup = BeautifulSoup(results['parse']['text']['*'], 'html.parser')
print(soup.prettify())
打印:
<div class="mw-parser-output">
<table cellspacing="5" class="float-right infobox toccolours toptextcells" style="font-size:90%; margin-top:0; width:21em;">
<tbody>
<tr>
<th class="hintergrundfarbe6" colspan="2" style="font-size:larger;">
Python
</th>
</tr>
<tr>
... and so on.
要仅获取 wiki template/tags,您可以这样做:
URL = "https://de.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"titles": "Python (Programmiersprache)",
"prop": "revisions",
"rvprop": "content",
"rvsection": 0,
"format": "json"
}
results = requests.get(url=URL, params=PARAMS).json()
print(results)
如果您有一种语言的维基百科页面标题,而您想知道另一种语言的标题,您可以使用 "langlinks" 属性,如下所示:
注意 "lllang" 设置为 "de"
这给你:
{
"batchcomplete": "",
"query": {
"pages": {
"23862": {
"pageid": 23862,
"ns": 0,
"title": "Python (programming language)",
"langlinks": [
{
"lang": "de",
"*": "Python (Programmiersprache)"
}
]
}
}
}
}
查看此处了解更多信息: https://www.mediawiki.org/wiki/API:Langlinks