通过 Java 程序从维基百科获取简要内容
Get brief content from Wikipedia by a Java Program
好的,所以我正在编写一个 Java 程序,它需要我搜索网络并显示数据。正如每个聪明人都会做的那样,搜索信息的最佳地点是维基百科。
我四处寻找并找到了 MediaWiki,但我不知道从哪里开始。我会解释我需要什么,感谢所有帮助!
示例:
用户输入:Ed Sheeran 是谁?
(提取部分留给我,我知道怎么做)
程序在后台搜索维基百科页面以查找 Ed Sheeran,并提取关于他的前几句话。然后,它提取信息并将其打印回来。
因此,程序制作完成后,我的输出将是:
用户输入:Ed Sheeran 是谁?
输出:Edward Christopher "Ed" Sheeran(生于 1991 年 2 月 17 日)是一位英国歌手兼词曲作者和临时演员。
用户输入:班加罗尔在哪里?
输出:班加罗尔/bæŋɡəˈlɔːr/,正式名称为班加罗尔([ˈbeŋɡəɭuːɾu]),是印度卡纳塔克邦的首府。
我们将不胜感激。谢谢!
这个查询对我有用:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
。 . .
String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/index.php?action=raw&title=" + subject.replace(" ", "_"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
String line = null;
while (null != (line = br.readLine())) {
line = line.trim();
if (!line.startsWith("|")
&& !line.startsWith("{")
&& !line.startsWith("}")
&& !line.startsWith("<center>")
&& !line.startsWith("---")) {
text += line;
}
if (text.length() > 200) {
break;
}
}
}
System.out.println("text = " + text);
打印:
text = '''Edward Christopher''' "'''Ed'''" '''Sheeran''' (born 17 February 1991) is an English singer-songwriter and occasional actor. Born in [[Hebden Bridge]], West Yorkshire and raised in [[Framlingham]],
对于其他查询,您可能需要反复试验才能从其内容中清除多余的内容。
更新
这里有一个使用库解析 JSON 的替代方法:
http://search.maven.org/#artifactdetails|org.json|json|20150729|jar
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import org.json.JSONObject;
...
String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exsentences=1&exintro=&explaintext=&exsectionformat=plain&titles=" + subject.replace(" ", "%20"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
String line = null;
while (null != (line = br.readLine())) {
line = line.trim();
if (true) {
text += line;
}
}
}
System.out.println("text = " + text);
JSONObject json = new JSONObject(text);
JSONObject query = json.getJSONObject("query");
JSONObject pages = query.getJSONObject("pages");
for(String key: pages.keySet()) {
System.out.println("key = " + key);
JSONObject page = pages.getJSONObject(key);
String extract = page.getString("extract");
System.out.println("extract = " + extract);
}
输出:
extract = Edward Christopher "Ed" Sheeran (born 17 February 1991) is an English singer-songwriter and occasional actor.
好的,所以我正在编写一个 Java 程序,它需要我搜索网络并显示数据。正如每个聪明人都会做的那样,搜索信息的最佳地点是维基百科。
我四处寻找并找到了 MediaWiki,但我不知道从哪里开始。我会解释我需要什么,感谢所有帮助!
示例:
用户输入:Ed Sheeran 是谁?
(提取部分留给我,我知道怎么做)
程序在后台搜索维基百科页面以查找 Ed Sheeran,并提取关于他的前几句话。然后,它提取信息并将其打印回来。
因此,程序制作完成后,我的输出将是:
用户输入:Ed Sheeran 是谁?
输出:Edward Christopher "Ed" Sheeran(生于 1991 年 2 月 17 日)是一位英国歌手兼词曲作者和临时演员。
用户输入:班加罗尔在哪里?
输出:班加罗尔/bæŋɡəˈlɔːr/,正式名称为班加罗尔([ˈbeŋɡəɭuːɾu]),是印度卡纳塔克邦的首府。
我们将不胜感激。谢谢!
这个查询对我有用:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
。 . .
String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/index.php?action=raw&title=" + subject.replace(" ", "_"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
String line = null;
while (null != (line = br.readLine())) {
line = line.trim();
if (!line.startsWith("|")
&& !line.startsWith("{")
&& !line.startsWith("}")
&& !line.startsWith("<center>")
&& !line.startsWith("---")) {
text += line;
}
if (text.length() > 200) {
break;
}
}
}
System.out.println("text = " + text);
打印:
text = '''Edward Christopher''' "'''Ed'''" '''Sheeran''' (born 17 February 1991) is an English singer-songwriter and occasional actor. Born in [[Hebden Bridge]], West Yorkshire and raised in [[Framlingham]],
对于其他查询,您可能需要反复试验才能从其内容中清除多余的内容。
更新
这里有一个使用库解析 JSON 的替代方法:
http://search.maven.org/#artifactdetails|org.json|json|20150729|jar
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import org.json.JSONObject;
...
String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exsentences=1&exintro=&explaintext=&exsectionformat=plain&titles=" + subject.replace(" ", "%20"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
String line = null;
while (null != (line = br.readLine())) {
line = line.trim();
if (true) {
text += line;
}
}
}
System.out.println("text = " + text);
JSONObject json = new JSONObject(text);
JSONObject query = json.getJSONObject("query");
JSONObject pages = query.getJSONObject("pages");
for(String key: pages.keySet()) {
System.out.println("key = " + key);
JSONObject page = pages.getJSONObject(key);
String extract = page.getString("extract");
System.out.println("extract = " + extract);
}
输出:
extract = Edward Christopher "Ed" Sheeran (born 17 February 1991) is an English singer-songwriter and occasional actor.