如何通过维基百科获取特定部分的文本 api
How to get a text of a specific section via wikipedia api
我只想从维基百科页面中提取特定的部分:
示例:
我想从维基百科文章 "House".
的 "Parts" 部分提取文本
https://en.wikipedia.org/wiki/House
结果文本将是:
Many houses have several large rooms ..... sections of the home (including in more recent eras a garage).
我们可以从类似下面的文章中得到漏洞文本:
但是如何获取特定部分的文本?
您是否需要纯维基文本或解析器的结果 HTML?
以下示例为您提供了 "Layout" 部分(房屋文章的第 3 部分,您也可以使用任何其他部分 ID)。
当您想要检索特定部分的已解析 html 时,您应该使用解析 api:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1
或者,作为沙箱外的 API 请求:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1
如果你想拥有特定部分的维基文本,只需使用 wikitext 属性而不是 text 属性:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=wikitext§ion=3&disabletoc=1
为了知道哪个部分有什么索引,您可以使用 "sections" 属性查询此信息,没有任何部分索引:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1
因此,作为以仅使用 API 的方式检索布局部分文本的完整示例,您将:
- 检索文章的章节:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1
回复:
{
"parse": {
"title": "House",
"pageid": 13590,
"sections": [
{
"toclevel": 1,
"level": "2",
"line": "Etymology",
"number": "1",
"index": "1",
"fromtitle": "House",
"byteoffset": 3549,
"anchor": "Etymology"
},
{
"toclevel": 1,
"level": "2",
"line": "Elements",
"number": "2",
"index": "2",
"fromtitle": "House",
"byteoffset": 4960,
"anchor": "Elements"
},
{
"toclevel": 2,
"level": "3",
"line": "Layout",
"number": "2.1",
"index": "3",
"fromtitle": "House",
"byteoffset": 4976,
"anchor": "Layout"
},
{
"toclevel": 2,
"level": "3",
"line": "Parts",
"number": "2.2",
"index": "4",
"fromtitle": "House",
"byteoffset": 6432,
"anchor": "Parts"
},
{
"toclevel": 2,
"level": "3",
"line": "History of the interior",
"number": "2.3",
"index": "5",
"fromtitle": "House",
"byteoffset": 7539,
"anchor": "History_of_the_interior"
},
{
"toclevel": 3,
"level": "4",
"line": "Communal rooms",
"number": "2.3.1",
"index": "6",
"fromtitle": "House",
"byteoffset": 8786,
"anchor": "Communal_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Interconnecting rooms",
"number": "2.3.2",
"index": "7",
"fromtitle": "House",
"byteoffset": 9736,
"anchor": "Interconnecting_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Corridor",
"number": "2.3.3",
"index": "8",
"fromtitle": "House",
"byteoffset": 11126,
"anchor": "Corridor"
},
{
"toclevel": 3,
"level": "4",
"line": "Employment-free house",
"number": "2.3.4",
"index": "9",
"fromtitle": "House",
"byteoffset": 13092,
"anchor": "Employment-free_house"
},
{
"toclevel": 2,
"level": "3",
"line": "Work location, technology and doctors",
"number": "2.4",
"index": "10",
"fromtitle": "House",
"byteoffset": 15969,
"anchor": "Work_location,_technology_and_doctors"
},
{
"toclevel": 3,
"level": "4",
"line": "Technology and privacy",
"number": "2.4.1",
"index": "11",
"fromtitle": "House",
"byteoffset": 17291,
"anchor": "Technology_and_privacy"
},
{
"toclevel": 1,
"level": "2",
"line": "Construction",
"number": "3",
"index": "12",
"fromtitle": "House",
"byteoffset": 18782,
"anchor": "Construction"
},
{
"toclevel": 2,
"level": "3",
"line": "Energy efficiency",
"number": "3.1",
"index": "13",
"fromtitle": "House",
"byteoffset": 21899,
"anchor": "Energy_efficiency"
},
{
"toclevel": 2,
"level": "3",
"line": "Earthquake protection",
"number": "3.2",
"index": "14",
"fromtitle": "House",
"byteoffset": 23057,
"anchor": "Earthquake_protection"
},
{
"toclevel": 1,
"level": "2",
"line": "Found materials",
"number": "4",
"index": "15",
"fromtitle": "House",
"byteoffset": 25172,
"anchor": "Found_materials"
},
{
"toclevel": 1,
"level": "2",
"line": "Legal issues",
"number": "5",
"index": "16",
"fromtitle": "House",
"byteoffset": 26235,
"anchor": "Legal_issues"
},
{
"toclevel": 2,
"level": "3",
"line": "United Kingdom",
"number": "5.1",
"index": "17",
"fromtitle": "House",
"byteoffset": 26644,
"anchor": "United_Kingdom"
},
{
"toclevel": 1,
"level": "2",
"line": "Identifying houses",
"number": "6",
"index": "18",
"fromtitle": "House",
"byteoffset": 26922,
"anchor": "Identifying_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Animal houses",
"number": "7",
"index": "19",
"fromtitle": "House",
"byteoffset": 27397,
"anchor": "Animal_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Houses and symbolism",
"number": "8",
"index": "20",
"fromtitle": "House",
"byteoffset": 27826,
"anchor": "Houses_and_symbolism"
},
{
"toclevel": 1,
"level": "2",
"line": "See also",
"number": "9",
"index": "21",
"fromtitle": "House",
"byteoffset": 28620,
"anchor": "See_also"
},
{
"toclevel": 1,
"level": "2",
"line": "References",
"number": "10",
"index": "22",
"fromtitle": "House",
"byteoffset": 29690,
"anchor": "References"
},
{
"toclevel": 1,
"level": "2",
"line": "External links",
"number": "11",
"index": "23",
"fromtitle": "House",
"byteoffset": 29720,
"anchor": "External_links"
}
]
}
}
- 迭代结果并找到你想要的部分,检索索引
- 在下一个API请求中使用索引获取节内容:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=wikitext§ion=3&disabletoc=1
回复:
{
"parse": {
"title": "House",
"pageid": 13590,
"wikitext": {
"*": "=== Layout ===\n[[File:Gingerbread House Essex CT.jpg|thumb|Example of an early [[Victorian architecture|Victorian]] \"Gingerbread House\" in [[Connecticut]], United States, built in 1855]]\n\nIdeally, [[architect]]s of houses design [[room]]s to meet the needs of the people who will live in the house. [[Feng shui]], originally a [[China|Chinese]] method of moving houses according to such factors as rain and micro-climates, has recently expanded its scope to address the design of interior spaces, with a view to promoting harmonious effects on the people living inside the house, although no actual effect has ever been demonstrated. Feng shui can also mean the \"aura\" in or around a dwelling, making it comparable to the [[real estate|real-estate]] sales concept of \"indoor-outdoor flow\".\n\nThe [[square footage]] of a house in the United States reports the area of \"living space\", excluding the garage and other non-living spaces. The \"square metres\" figure of a house in Europe <!-- including Malta ? --> reports the area of the walls enclosing the home, and thus includes any attached garage and non-living spaces.<ref>{{Cite book|title=Land Management: Challenges and Strategies (First Edition)|last=Iyyer|first=Chaitanya|publisher=Global India Publications Pvt Ltd|year=2009|isbn=978-9380228488|location=|pages=}}</ref>{{Citation needed|date=February 2007}} The number of floors or levels making up the house can affect the square footage of a home."
}
}
}
背景:
页面中的部分的想法尚未集成到修订中(尚未),修订是 "just" 整个页面的内容和附加元数据(例如,在多个其他插槽中),但部分是内容的一部分(这只是修订版中的一个插槽)。这就是为什么在使用修订查询 API 时,您只能获取整个文本。页面需要被解析才能知道章节是什么,因为章节是 wikitext 的概念,因此涉及解析器。
我只想从维基百科页面中提取特定的部分:
示例: 我想从维基百科文章 "House".
的 "Parts" 部分提取文本https://en.wikipedia.org/wiki/House
结果文本将是:
Many houses have several large rooms ..... sections of the home (including in more recent eras a garage).
我们可以从类似下面的文章中得到漏洞文本:
但是如何获取特定部分的文本?
您是否需要纯维基文本或解析器的结果 HTML?
以下示例为您提供了 "Layout" 部分(房屋文章的第 3 部分,您也可以使用任何其他部分 ID)。
当您想要检索特定部分的已解析 html 时,您应该使用解析 api: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1 或者,作为沙箱外的 API 请求: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1
如果你想拥有特定部分的维基文本,只需使用 wikitext 属性而不是 text 属性: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=wikitext§ion=3&disabletoc=1
为了知道哪个部分有什么索引,您可以使用 "sections" 属性查询此信息,没有任何部分索引: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1
因此,作为以仅使用 API 的方式检索布局部分文本的完整示例,您将:
- 检索文章的章节: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1
回复:
{
"parse": {
"title": "House",
"pageid": 13590,
"sections": [
{
"toclevel": 1,
"level": "2",
"line": "Etymology",
"number": "1",
"index": "1",
"fromtitle": "House",
"byteoffset": 3549,
"anchor": "Etymology"
},
{
"toclevel": 1,
"level": "2",
"line": "Elements",
"number": "2",
"index": "2",
"fromtitle": "House",
"byteoffset": 4960,
"anchor": "Elements"
},
{
"toclevel": 2,
"level": "3",
"line": "Layout",
"number": "2.1",
"index": "3",
"fromtitle": "House",
"byteoffset": 4976,
"anchor": "Layout"
},
{
"toclevel": 2,
"level": "3",
"line": "Parts",
"number": "2.2",
"index": "4",
"fromtitle": "House",
"byteoffset": 6432,
"anchor": "Parts"
},
{
"toclevel": 2,
"level": "3",
"line": "History of the interior",
"number": "2.3",
"index": "5",
"fromtitle": "House",
"byteoffset": 7539,
"anchor": "History_of_the_interior"
},
{
"toclevel": 3,
"level": "4",
"line": "Communal rooms",
"number": "2.3.1",
"index": "6",
"fromtitle": "House",
"byteoffset": 8786,
"anchor": "Communal_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Interconnecting rooms",
"number": "2.3.2",
"index": "7",
"fromtitle": "House",
"byteoffset": 9736,
"anchor": "Interconnecting_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Corridor",
"number": "2.3.3",
"index": "8",
"fromtitle": "House",
"byteoffset": 11126,
"anchor": "Corridor"
},
{
"toclevel": 3,
"level": "4",
"line": "Employment-free house",
"number": "2.3.4",
"index": "9",
"fromtitle": "House",
"byteoffset": 13092,
"anchor": "Employment-free_house"
},
{
"toclevel": 2,
"level": "3",
"line": "Work location, technology and doctors",
"number": "2.4",
"index": "10",
"fromtitle": "House",
"byteoffset": 15969,
"anchor": "Work_location,_technology_and_doctors"
},
{
"toclevel": 3,
"level": "4",
"line": "Technology and privacy",
"number": "2.4.1",
"index": "11",
"fromtitle": "House",
"byteoffset": 17291,
"anchor": "Technology_and_privacy"
},
{
"toclevel": 1,
"level": "2",
"line": "Construction",
"number": "3",
"index": "12",
"fromtitle": "House",
"byteoffset": 18782,
"anchor": "Construction"
},
{
"toclevel": 2,
"level": "3",
"line": "Energy efficiency",
"number": "3.1",
"index": "13",
"fromtitle": "House",
"byteoffset": 21899,
"anchor": "Energy_efficiency"
},
{
"toclevel": 2,
"level": "3",
"line": "Earthquake protection",
"number": "3.2",
"index": "14",
"fromtitle": "House",
"byteoffset": 23057,
"anchor": "Earthquake_protection"
},
{
"toclevel": 1,
"level": "2",
"line": "Found materials",
"number": "4",
"index": "15",
"fromtitle": "House",
"byteoffset": 25172,
"anchor": "Found_materials"
},
{
"toclevel": 1,
"level": "2",
"line": "Legal issues",
"number": "5",
"index": "16",
"fromtitle": "House",
"byteoffset": 26235,
"anchor": "Legal_issues"
},
{
"toclevel": 2,
"level": "3",
"line": "United Kingdom",
"number": "5.1",
"index": "17",
"fromtitle": "House",
"byteoffset": 26644,
"anchor": "United_Kingdom"
},
{
"toclevel": 1,
"level": "2",
"line": "Identifying houses",
"number": "6",
"index": "18",
"fromtitle": "House",
"byteoffset": 26922,
"anchor": "Identifying_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Animal houses",
"number": "7",
"index": "19",
"fromtitle": "House",
"byteoffset": 27397,
"anchor": "Animal_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Houses and symbolism",
"number": "8",
"index": "20",
"fromtitle": "House",
"byteoffset": 27826,
"anchor": "Houses_and_symbolism"
},
{
"toclevel": 1,
"level": "2",
"line": "See also",
"number": "9",
"index": "21",
"fromtitle": "House",
"byteoffset": 28620,
"anchor": "See_also"
},
{
"toclevel": 1,
"level": "2",
"line": "References",
"number": "10",
"index": "22",
"fromtitle": "House",
"byteoffset": 29690,
"anchor": "References"
},
{
"toclevel": 1,
"level": "2",
"line": "External links",
"number": "11",
"index": "23",
"fromtitle": "House",
"byteoffset": 29720,
"anchor": "External_links"
}
]
}
}
- 迭代结果并找到你想要的部分,检索索引
- 在下一个API请求中使用索引获取节内容: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=wikitext§ion=3&disabletoc=1
回复:
{
"parse": {
"title": "House",
"pageid": 13590,
"wikitext": {
"*": "=== Layout ===\n[[File:Gingerbread House Essex CT.jpg|thumb|Example of an early [[Victorian architecture|Victorian]] \"Gingerbread House\" in [[Connecticut]], United States, built in 1855]]\n\nIdeally, [[architect]]s of houses design [[room]]s to meet the needs of the people who will live in the house. [[Feng shui]], originally a [[China|Chinese]] method of moving houses according to such factors as rain and micro-climates, has recently expanded its scope to address the design of interior spaces, with a view to promoting harmonious effects on the people living inside the house, although no actual effect has ever been demonstrated. Feng shui can also mean the \"aura\" in or around a dwelling, making it comparable to the [[real estate|real-estate]] sales concept of \"indoor-outdoor flow\".\n\nThe [[square footage]] of a house in the United States reports the area of \"living space\", excluding the garage and other non-living spaces. The \"square metres\" figure of a house in Europe <!-- including Malta ? --> reports the area of the walls enclosing the home, and thus includes any attached garage and non-living spaces.<ref>{{Cite book|title=Land Management: Challenges and Strategies (First Edition)|last=Iyyer|first=Chaitanya|publisher=Global India Publications Pvt Ltd|year=2009|isbn=978-9380228488|location=|pages=}}</ref>{{Citation needed|date=February 2007}} The number of floors or levels making up the house can affect the square footage of a home."
}
}
}
背景: 页面中的部分的想法尚未集成到修订中(尚未),修订是 "just" 整个页面的内容和附加元数据(例如,在多个其他插槽中),但部分是内容的一部分(这只是修订版中的一个插槽)。这就是为什么在使用修订查询 API 时,您只能获取整个文本。页面需要被解析才能知道章节是什么,因为章节是 wikitext 的概念,因此涉及解析器。