如何通过维基百科获取特定部分的文本 api

How to get a text of a specific section via wikipedia api

我只想从维基百科页面中提取特定的部分:

示例: 我想从维基百科文章 "House".

的 "Parts" 部分提取文本

https://en.wikipedia.org/wiki/House

结果文本将是:

Many houses have several large rooms  .....  sections of the home (including in more recent eras a garage). 

我们可以从类似下面的文章中得到漏洞文本:

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=house&rvprop=content&format=json

但是如何获取特定部分的文本?

您是否需要纯维基文本或解析器的结果 HTML?

以下示例为您提供了 "Layout" 部分(房屋文章的第 3 部分,您也可以使用任何其他部分 ID)。

当您想要检索特定部分的已解析 html 时,您应该使用解析 api: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=text&section=3&disabletoc=1 或者,作为沙箱外的 API 请求: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=text&section=3&disabletoc=1

如果你想拥有特定部分的维基文本,只需使用 wikitext 属性而不是 text 属性: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=wikitext&section=3&disabletoc=1

为了知道哪个部分有什么索引,您可以使用 "sections" 属性查询此信息,没有任何部分索引: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1

因此,作为以仅使用 API 的方式检索布局部分文本的完整示例,您将:

  1. 检索文章的章节: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1

回复:

{
    "parse": {
        "title": "House",
        "pageid": 13590,
        "sections": [
            {
                "toclevel": 1,
                "level": "2",
                "line": "Etymology",
                "number": "1",
                "index": "1",
                "fromtitle": "House",
                "byteoffset": 3549,
                "anchor": "Etymology"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Elements",
                "number": "2",
                "index": "2",
                "fromtitle": "House",
                "byteoffset": 4960,
                "anchor": "Elements"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "Layout",
                "number": "2.1",
                "index": "3",
                "fromtitle": "House",
                "byteoffset": 4976,
                "anchor": "Layout"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "Parts",
                "number": "2.2",
                "index": "4",
                "fromtitle": "House",
                "byteoffset": 6432,
                "anchor": "Parts"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "History of the interior",
                "number": "2.3",
                "index": "5",
                "fromtitle": "House",
                "byteoffset": 7539,
                "anchor": "History_of_the_interior"
            },
            {
                "toclevel": 3,
                "level": "4",
                "line": "Communal rooms",
                "number": "2.3.1",
                "index": "6",
                "fromtitle": "House",
                "byteoffset": 8786,
                "anchor": "Communal_rooms"
            },
            {
                "toclevel": 3,
                "level": "4",
                "line": "Interconnecting rooms",
                "number": "2.3.2",
                "index": "7",
                "fromtitle": "House",
                "byteoffset": 9736,
                "anchor": "Interconnecting_rooms"
            },
            {
                "toclevel": 3,
                "level": "4",
                "line": "Corridor",
                "number": "2.3.3",
                "index": "8",
                "fromtitle": "House",
                "byteoffset": 11126,
                "anchor": "Corridor"
            },
            {
                "toclevel": 3,
                "level": "4",
                "line": "Employment-free house",
                "number": "2.3.4",
                "index": "9",
                "fromtitle": "House",
                "byteoffset": 13092,
                "anchor": "Employment-free_house"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "Work location, technology and doctors",
                "number": "2.4",
                "index": "10",
                "fromtitle": "House",
                "byteoffset": 15969,
                "anchor": "Work_location,_technology_and_doctors"
            },
            {
                "toclevel": 3,
                "level": "4",
                "line": "Technology and privacy",
                "number": "2.4.1",
                "index": "11",
                "fromtitle": "House",
                "byteoffset": 17291,
                "anchor": "Technology_and_privacy"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Construction",
                "number": "3",
                "index": "12",
                "fromtitle": "House",
                "byteoffset": 18782,
                "anchor": "Construction"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "Energy efficiency",
                "number": "3.1",
                "index": "13",
                "fromtitle": "House",
                "byteoffset": 21899,
                "anchor": "Energy_efficiency"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "Earthquake protection",
                "number": "3.2",
                "index": "14",
                "fromtitle": "House",
                "byteoffset": 23057,
                "anchor": "Earthquake_protection"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Found materials",
                "number": "4",
                "index": "15",
                "fromtitle": "House",
                "byteoffset": 25172,
                "anchor": "Found_materials"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Legal issues",
                "number": "5",
                "index": "16",
                "fromtitle": "House",
                "byteoffset": 26235,
                "anchor": "Legal_issues"
            },
            {
                "toclevel": 2,
                "level": "3",
                "line": "United Kingdom",
                "number": "5.1",
                "index": "17",
                "fromtitle": "House",
                "byteoffset": 26644,
                "anchor": "United_Kingdom"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Identifying houses",
                "number": "6",
                "index": "18",
                "fromtitle": "House",
                "byteoffset": 26922,
                "anchor": "Identifying_houses"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Animal houses",
                "number": "7",
                "index": "19",
                "fromtitle": "House",
                "byteoffset": 27397,
                "anchor": "Animal_houses"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "Houses and symbolism",
                "number": "8",
                "index": "20",
                "fromtitle": "House",
                "byteoffset": 27826,
                "anchor": "Houses_and_symbolism"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "See also",
                "number": "9",
                "index": "21",
                "fromtitle": "House",
                "byteoffset": 28620,
                "anchor": "See_also"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "References",
                "number": "10",
                "index": "22",
                "fromtitle": "House",
                "byteoffset": 29690,
                "anchor": "References"
            },
            {
                "toclevel": 1,
                "level": "2",
                "line": "External links",
                "number": "11",
                "index": "23",
                "fromtitle": "House",
                "byteoffset": 29720,
                "anchor": "External_links"
            }
        ]
    }
}
  1. 迭代结果并找到你想要的部分,检索索引
  2. 在下一个API请求中使用索引获取节内容: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=wikitext&section=3&disabletoc=1

回复:

{
    "parse": {
        "title": "House",
        "pageid": 13590,
        "wikitext": {
            "*": "=== Layout ===\n[[File:Gingerbread House Essex CT.jpg|thumb|Example of an early [[Victorian architecture|Victorian]] \"Gingerbread House\" in [[Connecticut]], United States, built in 1855]]\n\nIdeally, [[architect]]s of houses design [[room]]s to meet the needs of the people who will live in the house. [[Feng shui]], originally a [[China|Chinese]] method of moving houses according to such factors as rain and micro-climates, has recently expanded its scope to address the design of interior spaces, with a view to promoting harmonious effects on the people living inside the house, although no actual effect has ever been demonstrated. Feng shui can also mean the \"aura\" in or around a dwelling, making it comparable to the [[real estate|real-estate]] sales concept of \"indoor-outdoor flow\".\n\nThe [[square footage]] of a house in the United States reports the area of \"living space\", excluding the garage and other non-living spaces. The \"square metres\" figure of a house in Europe <!-- including Malta ? --> reports the area of the walls enclosing the home, and thus includes any attached garage and non-living spaces.<ref>{{Cite book|title=Land Management: Challenges and Strategies (First Edition)|last=Iyyer|first=Chaitanya|publisher=Global India Publications Pvt Ltd|year=2009|isbn=978-9380228488|location=|pages=}}</ref>{{Citation needed|date=February 2007}} The number of floors or levels making up the house can affect the square footage of a home."
        }
    }
}

背景: 页面中的部分的想法尚未集成到修订中(尚未),修订是 "just" 整个页面的内容和附加元数据(例如,在多个其他插槽中),但部分是内容的一部分(这只是修订版中的一个插槽)。这就是为什么在使用修订查询 API 时,您只能获取整个文本。页面需要被解析才能知道章节是什么,因为章节是 wikitext 的概念,因此涉及解析器。