使用 Wikipedia API 和 Python 2.7 从列表中提取特定用户评论
Extract specific users comments from a list using Wikipedia API and Python 2.7
我正在使用维基百科 API - wikitools 包从维基百科中提取一些数据。我得到如下所示格式的输出,现在我想提取时间戳和特定用户对多个页面所做的修订的评论。假设我只想要 TechBot 发表的评论,然后我想我可以做类似的事情:
for revision in res["query"]["pages"]["7940378"]["revisions"]:
if revision["user"] = "Techbot":
do.something()
但问题是 ["7940378"] 因为这是一个唯一的页面 ID,每个页面都会更改,我不知道如何获取页面 ID。还有其他方法吗?
[{
"query": {
"pages": {
"7940378": {
"ns": 0,
"pageid": 7940378,
"revisions": [
{
"comment": "robot Modifying: [[az:T\u00fcrk Tarixi]]",
"timestamp": "2009-01-03T19:47:11Z",
"user": "TechBot"
},
{
"comment": "",
"timestamp": "2009-02-14T02:07:49Z",
"anon": "",
"user": "88.231.237.130"
},
{
"comment": "fixing recent deletion by merging it with the next paragraph",
"timestamp": "2009-04-03T14:49:27Z",
"user": "Soap"
},
{
"comment": "robot Modifying: [[az:T\u00fcrk tarixi]]",
"timestamp": "2009-04-09T14:35:19Z",
"user": "RibotBOT"
},
{
"comment": "Repairing link to disambiguation page - [[Wikipedia:Disambiguation pages with links|You can help!]]",
"timestamp": "2009-06-12T23:55:55Z",
"user": "J04n"
}
],
"title": "History of the Turkic peoples"
}
}
},
"continue": {
"rvcontinue": "20090807172715|306635892",
"continue": "||"
},
"warnings": {
"main": {
"*": "Unrecognized parameter: 'user'"
}
}
}]
而不是使用单个 for 循环。你可以分成两个循环,外循环获取页面,内循环获取修订。
for pageid, pagedetails in res["query"]["pages"].iteritems():
for revision in pagedetails["revisions"]:
if revision["user"] == "TechBot":
do.something()
我正在使用维基百科 API - wikitools 包从维基百科中提取一些数据。我得到如下所示格式的输出,现在我想提取时间戳和特定用户对多个页面所做的修订的评论。假设我只想要 TechBot 发表的评论,然后我想我可以做类似的事情:
for revision in res["query"]["pages"]["7940378"]["revisions"]:
if revision["user"] = "Techbot":
do.something()
但问题是 ["7940378"] 因为这是一个唯一的页面 ID,每个页面都会更改,我不知道如何获取页面 ID。还有其他方法吗?
[{
"query": {
"pages": {
"7940378": {
"ns": 0,
"pageid": 7940378,
"revisions": [
{
"comment": "robot Modifying: [[az:T\u00fcrk Tarixi]]",
"timestamp": "2009-01-03T19:47:11Z",
"user": "TechBot"
},
{
"comment": "",
"timestamp": "2009-02-14T02:07:49Z",
"anon": "",
"user": "88.231.237.130"
},
{
"comment": "fixing recent deletion by merging it with the next paragraph",
"timestamp": "2009-04-03T14:49:27Z",
"user": "Soap"
},
{
"comment": "robot Modifying: [[az:T\u00fcrk tarixi]]",
"timestamp": "2009-04-09T14:35:19Z",
"user": "RibotBOT"
},
{
"comment": "Repairing link to disambiguation page - [[Wikipedia:Disambiguation pages with links|You can help!]]",
"timestamp": "2009-06-12T23:55:55Z",
"user": "J04n"
}
],
"title": "History of the Turkic peoples"
}
}
},
"continue": {
"rvcontinue": "20090807172715|306635892",
"continue": "||"
},
"warnings": {
"main": {
"*": "Unrecognized parameter: 'user'"
}
}
}]
而不是使用单个 for 循环。你可以分成两个循环,外循环获取页面,内循环获取修订。
for pageid, pagedetails in res["query"]["pages"].iteritems():
for revision in pagedetails["revisions"]:
if revision["user"] == "TechBot":
do.something()