通过 Java 程序从维基百科获取简要内容

Get brief content from Wikipedia by a Java Program

好的,所以我正在编写一个 Java 程序,它需要我搜索网络并显示数据。正如每个聪明人都会做的那样,搜索信息的最佳地点是维基百科。

我四处寻找并找到了 MediaWiki,但我不知道从哪里开始。我会解释我需要什么,感谢所有帮助!

示例: 用户输入:Ed Sheeran 是谁?
(提取部分留给我,我知道怎么做)

程序在后台搜索维基百科页面以查找 Ed Sheeran,并提取关于他的前几句话。然后,它提取信息并将其打印回来。

因此,程序制作完成后,我的输出将是:

用户输入:Ed Sheeran 是谁?
输出:Edward Christopher "Ed" Sheeran(生于 1991 年 2 月 17 日)是一位英国歌手兼词曲作者和临时演员。

用户输入:班加罗尔在哪里?
输出:班加罗尔/bæŋɡəˈlɔːr/,正式名称为班加罗尔([ˈbeŋɡəɭuːɾu]),是印度卡纳塔克邦的首府。

我们将不胜感激。谢谢!

这个查询对我有用:

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

。 . .

String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/index.php?action=raw&title=" + subject.replace(" ", "_"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
    String line = null;
    while (null != (line = br.readLine())) {
        line = line.trim();
        if (!line.startsWith("|")
                && !line.startsWith("{")
                && !line.startsWith("}")
                && !line.startsWith("<center>")
                && !line.startsWith("---")) {
            text += line;
        }
        if (text.length() > 200) {
            break;
        }
    }
}
System.out.println("text = " + text);

打印:

text = '''Edward Christopher''' "'''Ed'''" '''Sheeran''' (born 17 February 1991) is an English singer-songwriter and occasional actor. Born in [[Hebden Bridge]], West Yorkshire and raised in [[Framlingham]],

对于其他查询,您可能需要反复试验才能从其内容中清除多余的内容。

更新

这里有一个使用库解析 JSON 的替代方法:
http://search.maven.org/#artifactdetails|org.json|json|20150729|jar

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import org.json.JSONObject;

...

String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exsentences=1&exintro=&explaintext=&exsectionformat=plain&titles=" + subject.replace(" ", "%20"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
    String line = null;
    while (null != (line = br.readLine())) {
        line = line.trim();
        if (true) {
            text += line;
        }
    }
}

System.out.println("text = " + text);
JSONObject json = new JSONObject(text);
JSONObject query = json.getJSONObject("query");
JSONObject pages = query.getJSONObject("pages");
for(String key: pages.keySet()) {
    System.out.println("key = " + key);
    JSONObject page = pages.getJSONObject(key);
    String extract = page.getString("extract");
    System.out.println("extract = " + extract);
}

输出:

extract = Edward Christopher "Ed" Sheeran (born 17 February 1991) is an English singer-songwriter and occasional actor.