通过java 2D ArrayList of Strings 矩阵搜索重复实例，广度优先？

Question

我有一个数据语料库、一个矩阵、一个二维字符串数组列表，[3][不确定，相当多]，形式为：

deduct under |||the council \'s demerit points system |||further # points |||
lead |||him |||the council \'s demerit points system |||
want |||their licenses |||they |||
lie between |||# and # |||the general index |||
exceed |||# |||the general index |||
lie between |||# and # |||the roadside index |||
advise to avoid |||prolonged stay |||respiratory illnesses |||
be necessary to stay in |||these areas |||it |||
exceed |||# |||the roadside index |||
be necessary to stay in |||these areas |||it |||
hoist |||attention tv/radio announcers |||october # , # red flag |||
be item of |||interest |||the following |||
hoist at |||silverstrand beach |||the red flag |||
issue on |||behalf of the provisional regional council |||the following |||
publish by |||the provisional regional council |||the tang dynasty |||
present under |||# sections |||the artefacts |||

*分隔符"|||"其实并不是数据的一部分，我只是为了方便阅读而放在这里。

它是 java 程序的一部分，通过以下代码从输入文件生成：

List<List<String>> arr = new ArrayList<>();
Pattern p = Pattern.compile("'(.*?)'(?![a-zA-Z])"); 
//while the file is still reading
while ((line_0 = br_0.readLine()) != null) 
{
    List<String> three = new ArrayList<>();         
    Matcher m = p.matcher(line_0);
    int j = 0;
    while (m.find()) 
    {
        three.add( m.group(1) );
    }
    arr.add( three );
}

对于我想搜索数组的每个数据，可能以广度优先的方式搜索，如果该数据再次出现在矩阵的其他地方，我不想考虑它发生的位置，当然还有产生的催化剂这个发现，如果你愿意的话，搜索者。如何有效地做到这一点？我正在处理大量数据。

Answer 1

已过时 - 删除了旧答案的文本。由于下面的评论而保留它。

Answer 2

句子：用于存储三元组 verb(object,subject):

public class Sentence {
private String verb;
private String object;
private String subject;
public Sentence(String verb, String object, String subject ){
    this.verb = verb;
    this.object = object;
    this.subject = subject;
}
public String getVerb(){ return verb; }
public String getObject(){ return object; }
public String getSubject(){ return subject; }
public String toString(){
    return verb + "(" + object + ", " + subject + ")";
}
}

收集和链接句子：

public class Ontology {
private List<Sentence> sentences = new ArrayList<>();
/*
 * The following maps store the relation of a string occurring
 * as a subject or object, respectively, to the list of Sentence
 * ordinals where they occur.
 */
private Map<String,List<Integer>> subject2index = new HashMap<>();
private Map<String,List<Integer>> object2index = new HashMap<>();
/*
 * This set contains strings that occur as both,
 * subject and object. This is useful for determining strings
 * acting as an in-between connecting two relations. 
 */
private Set<String> joints = new HashSet<>();
public void addSentence( Sentence s ){
    // add Sentence to the list of all Sentences
    sentences.add( s );
    // add the Subject of the Sentence to the map mapping strings
    // occurring as a subject to the ordinal of this Sentence
    List<Integer> subind = subject2index.get( s.getSubject() );
    if( subind == null ){
       subind = new ArrayList<>();
        subject2index.put( s.getSubject(), subind );
    }
    subind.add( sentences.size() - 1 );
    // add the Object of the Sentence to the map mapping strings
    // occurring as an object to the ordinal of this Sentence
    List<Integer> objind = object2index.get( s.getObject() );
    if( objind == null ){
        objind = new ArrayList<>();
        object2index.put( s.getObject(), objind );
    }
    objind.add( sentences.size() - 1 );
    // determine whether we've found a "joining" string
    if( subject2index.containsKey( s.getObject() ) ){
        joints.add( s.getObject() );
    }
    if( object2index.containsKey( s.getSubject() ) ){
        joints.add( s.getSubject() );
    }
}
public Collection<String> getJoints(){
    return joints;
}
public List<Integer> getSubjectIndices( String subject ){
    return subject2index.get( subject );
}
public List<Integer> getObjectIndices( String object ){
    return object2index.get( object );
}
public Sentence getSentence( int index ){
    return sentences.get( index );
}
}

小测试：

public static void main(String[] args) throws IOException {
    Ontology ontology = new Ontology();
    BufferedReader br = new BufferedReader(new FileReader("file.txt"));
    Pattern p = Pattern.compile("'(.*?)'\('(.*?)','(.*?)'\)"); 
    String line;
    while ((line = br.readLine()) != null) {
        Matcher m = p.matcher(line);
        if( m.matches() ) {
            String verb    = m.group(1);
            String object  = m.group(2);
            String subject = m.group(3);
            ontology.addSentence( new Sentence( verb, object, subject ) );
        }
    }

    for( String joint: ontology.getJoints() ){
        for( Integer subind: ontology.getSubjectIndices( joint ) ){
            Sentence xaS = ontology.getSentence( subind );
            for( Integer obind: ontology.getObjectIndices( joint ) ){
                Sentence yOb = ontology.getSentence( obind );
                Sentence s = new Sentence( xaS.getVerb(),
                                           xaS.getObject(),
                                           yOb.getSubject() );
                System.out.println( s );
            }
        }
    }
}

输入：

'prevents'('scurvy','vitamin C')
'contains'('vitamin C','orange')
'contains'('vitamin C','sauerkraut')
'isa'('fruit','orange')
'improves'('health','fruit')

输出：

prevents(scurvy, orange)
prevents(scurvy, sauerkraut)
improves(health, orange)

通过java 2D ArrayList of Strings 矩阵搜索重复实例，广度优先？

matrix search through java 2D ArrayList of Strings for duplicate instances, breadth first?

java

arrays

search

arraylist

matrix