通过java 2D ArrayList of Strings 矩阵搜索重复实例,广度优先?
matrix search through java 2D ArrayList of Strings for duplicate instances, breadth first?
我有一个数据语料库、一个矩阵、一个二维字符串数组列表,[3][不确定,相当多],形式为:
deduct under |||the council \'s demerit points system |||further # points |||
lead |||him |||the council \'s demerit points system |||
want |||their licenses |||they |||
lie between |||# and # |||the general index |||
exceed |||# |||the general index |||
lie between |||# and # |||the roadside index |||
advise to avoid |||prolonged stay |||respiratory illnesses |||
be necessary to stay in |||these areas |||it |||
exceed |||# |||the roadside index |||
be necessary to stay in |||these areas |||it |||
hoist |||attention tv/radio announcers |||october # , # red flag |||
be item of |||interest |||the following |||
hoist at |||silverstrand beach |||the red flag |||
issue on |||behalf of the provisional regional council |||the following |||
publish by |||the provisional regional council |||the tang dynasty |||
present under |||# sections |||the artefacts |||
*分隔符"|||"其实并不是数据的一部分,我只是为了方便阅读而放在这里。
它是 java 程序的一部分,通过以下代码从输入文件生成:
List<List<String>> arr = new ArrayList<>();
Pattern p = Pattern.compile("'(.*?)'(?![a-zA-Z])");
//while the file is still reading
while ((line_0 = br_0.readLine()) != null)
{
List<String> three = new ArrayList<>();
Matcher m = p.matcher(line_0);
int j = 0;
while (m.find())
{
three.add( m.group(1) );
}
arr.add( three );
}
对于我想搜索数组的每个数据,可能以广度优先的方式搜索,如果该数据再次出现在矩阵的其他地方,我不想考虑它发生的位置,当然还有产生的催化剂这个发现,如果你愿意的话,搜索者。如何有效地做到这一点?我正在处理大量数据。
已过时 - 删除了旧答案的文本。由于下面的评论而保留它。
句子:用于存储三元组 verb(object,subject):
public class Sentence {
private String verb;
private String object;
private String subject;
public Sentence(String verb, String object, String subject ){
this.verb = verb;
this.object = object;
this.subject = subject;
}
public String getVerb(){ return verb; }
public String getObject(){ return object; }
public String getSubject(){ return subject; }
public String toString(){
return verb + "(" + object + ", " + subject + ")";
}
}
收集和链接句子:
public class Ontology {
private List<Sentence> sentences = new ArrayList<>();
/*
* The following maps store the relation of a string occurring
* as a subject or object, respectively, to the list of Sentence
* ordinals where they occur.
*/
private Map<String,List<Integer>> subject2index = new HashMap<>();
private Map<String,List<Integer>> object2index = new HashMap<>();
/*
* This set contains strings that occur as both,
* subject and object. This is useful for determining strings
* acting as an in-between connecting two relations.
*/
private Set<String> joints = new HashSet<>();
public void addSentence( Sentence s ){
// add Sentence to the list of all Sentences
sentences.add( s );
// add the Subject of the Sentence to the map mapping strings
// occurring as a subject to the ordinal of this Sentence
List<Integer> subind = subject2index.get( s.getSubject() );
if( subind == null ){
subind = new ArrayList<>();
subject2index.put( s.getSubject(), subind );
}
subind.add( sentences.size() - 1 );
// add the Object of the Sentence to the map mapping strings
// occurring as an object to the ordinal of this Sentence
List<Integer> objind = object2index.get( s.getObject() );
if( objind == null ){
objind = new ArrayList<>();
object2index.put( s.getObject(), objind );
}
objind.add( sentences.size() - 1 );
// determine whether we've found a "joining" string
if( subject2index.containsKey( s.getObject() ) ){
joints.add( s.getObject() );
}
if( object2index.containsKey( s.getSubject() ) ){
joints.add( s.getSubject() );
}
}
public Collection<String> getJoints(){
return joints;
}
public List<Integer> getSubjectIndices( String subject ){
return subject2index.get( subject );
}
public List<Integer> getObjectIndices( String object ){
return object2index.get( object );
}
public Sentence getSentence( int index ){
return sentences.get( index );
}
}
小测试:
public static void main(String[] args) throws IOException {
Ontology ontology = new Ontology();
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
Pattern p = Pattern.compile("'(.*?)'\('(.*?)','(.*?)'\)");
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if( m.matches() ) {
String verb = m.group(1);
String object = m.group(2);
String subject = m.group(3);
ontology.addSentence( new Sentence( verb, object, subject ) );
}
}
for( String joint: ontology.getJoints() ){
for( Integer subind: ontology.getSubjectIndices( joint ) ){
Sentence xaS = ontology.getSentence( subind );
for( Integer obind: ontology.getObjectIndices( joint ) ){
Sentence yOb = ontology.getSentence( obind );
Sentence s = new Sentence( xaS.getVerb(),
xaS.getObject(),
yOb.getSubject() );
System.out.println( s );
}
}
}
}
输入:
'prevents'('scurvy','vitamin C')
'contains'('vitamin C','orange')
'contains'('vitamin C','sauerkraut')
'isa'('fruit','orange')
'improves'('health','fruit')
输出:
prevents(scurvy, orange)
prevents(scurvy, sauerkraut)
improves(health, orange)
我有一个数据语料库、一个矩阵、一个二维字符串数组列表,[3][不确定,相当多],形式为:
deduct under |||the council \'s demerit points system |||further # points |||
lead |||him |||the council \'s demerit points system |||
want |||their licenses |||they |||
lie between |||# and # |||the general index |||
exceed |||# |||the general index |||
lie between |||# and # |||the roadside index |||
advise to avoid |||prolonged stay |||respiratory illnesses |||
be necessary to stay in |||these areas |||it |||
exceed |||# |||the roadside index |||
be necessary to stay in |||these areas |||it |||
hoist |||attention tv/radio announcers |||october # , # red flag |||
be item of |||interest |||the following |||
hoist at |||silverstrand beach |||the red flag |||
issue on |||behalf of the provisional regional council |||the following |||
publish by |||the provisional regional council |||the tang dynasty |||
present under |||# sections |||the artefacts |||
*分隔符"|||"其实并不是数据的一部分,我只是为了方便阅读而放在这里。
它是 java 程序的一部分,通过以下代码从输入文件生成:
List<List<String>> arr = new ArrayList<>();
Pattern p = Pattern.compile("'(.*?)'(?![a-zA-Z])");
//while the file is still reading
while ((line_0 = br_0.readLine()) != null)
{
List<String> three = new ArrayList<>();
Matcher m = p.matcher(line_0);
int j = 0;
while (m.find())
{
three.add( m.group(1) );
}
arr.add( three );
}
对于我想搜索数组的每个数据,可能以广度优先的方式搜索,如果该数据再次出现在矩阵的其他地方,我不想考虑它发生的位置,当然还有产生的催化剂这个发现,如果你愿意的话,搜索者。如何有效地做到这一点?我正在处理大量数据。
已过时 - 删除了旧答案的文本。由于下面的评论而保留它。
句子:用于存储三元组 verb(object,subject):
public class Sentence {
private String verb;
private String object;
private String subject;
public Sentence(String verb, String object, String subject ){
this.verb = verb;
this.object = object;
this.subject = subject;
}
public String getVerb(){ return verb; }
public String getObject(){ return object; }
public String getSubject(){ return subject; }
public String toString(){
return verb + "(" + object + ", " + subject + ")";
}
}
收集和链接句子:
public class Ontology {
private List<Sentence> sentences = new ArrayList<>();
/*
* The following maps store the relation of a string occurring
* as a subject or object, respectively, to the list of Sentence
* ordinals where they occur.
*/
private Map<String,List<Integer>> subject2index = new HashMap<>();
private Map<String,List<Integer>> object2index = new HashMap<>();
/*
* This set contains strings that occur as both,
* subject and object. This is useful for determining strings
* acting as an in-between connecting two relations.
*/
private Set<String> joints = new HashSet<>();
public void addSentence( Sentence s ){
// add Sentence to the list of all Sentences
sentences.add( s );
// add the Subject of the Sentence to the map mapping strings
// occurring as a subject to the ordinal of this Sentence
List<Integer> subind = subject2index.get( s.getSubject() );
if( subind == null ){
subind = new ArrayList<>();
subject2index.put( s.getSubject(), subind );
}
subind.add( sentences.size() - 1 );
// add the Object of the Sentence to the map mapping strings
// occurring as an object to the ordinal of this Sentence
List<Integer> objind = object2index.get( s.getObject() );
if( objind == null ){
objind = new ArrayList<>();
object2index.put( s.getObject(), objind );
}
objind.add( sentences.size() - 1 );
// determine whether we've found a "joining" string
if( subject2index.containsKey( s.getObject() ) ){
joints.add( s.getObject() );
}
if( object2index.containsKey( s.getSubject() ) ){
joints.add( s.getSubject() );
}
}
public Collection<String> getJoints(){
return joints;
}
public List<Integer> getSubjectIndices( String subject ){
return subject2index.get( subject );
}
public List<Integer> getObjectIndices( String object ){
return object2index.get( object );
}
public Sentence getSentence( int index ){
return sentences.get( index );
}
}
小测试:
public static void main(String[] args) throws IOException {
Ontology ontology = new Ontology();
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
Pattern p = Pattern.compile("'(.*?)'\('(.*?)','(.*?)'\)");
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if( m.matches() ) {
String verb = m.group(1);
String object = m.group(2);
String subject = m.group(3);
ontology.addSentence( new Sentence( verb, object, subject ) );
}
}
for( String joint: ontology.getJoints() ){
for( Integer subind: ontology.getSubjectIndices( joint ) ){
Sentence xaS = ontology.getSentence( subind );
for( Integer obind: ontology.getObjectIndices( joint ) ){
Sentence yOb = ontology.getSentence( obind );
Sentence s = new Sentence( xaS.getVerb(),
xaS.getObject(),
yOb.getSubject() );
System.out.println( s );
}
}
}
}
输入:
'prevents'('scurvy','vitamin C')
'contains'('vitamin C','orange')
'contains'('vitamin C','sauerkraut')
'isa'('fruit','orange')
'improves'('health','fruit')
输出:
prevents(scurvy, orange)
prevents(scurvy, sauerkraut)
improves(health, orange)