Java 中的词频和行数

Word frequency and line number in Java

Write a program using Java or C# that counts the frequencies of each word in a text, and output each word with its count and line numbers where it appears. We define a word as a contiguous sequence of non-white-space characters. (hint: split()) Note: different capitalizations of the same character sequence should be considered same word, e.g. Python and python, I and i. The input will be several lines with the empty line terminating the text (using text file for input is optional). Only alphabet characters and white spaces will be present in the input. The output is formatted as follows:

- 1 python 1
- 3 is 1 2
- 1 a 1
- 1 but 1
- 1 cool 1 2
- 1 even 2
- 1 object 2
- 1 oriented 2
- 1 it 2
- 1 language 1 2
- 1 Java 1
- 1 purely 2
- 1 since 2

这是我的:

public class Test {
    public static void main(String[] args) 
    {
        String text = "Python is a cool language but Java \n" +
        "is also cool since it is purely object oriented language ";
        String[] keys = text.split(" ");
        String[] uniqueKeys;
        int count = 0;
        System.out.println(text);
        uniqueKeys = getUniqueKeys(keys);
        int line2 = text.indexOf('\n');

        for(String key: uniqueKeys)
        {
            if(null == key)
            {
                break;
            }           
            for(String s : keys)
            {
                if(key.equals(s))
                {
                    count++;
                }               
            }

            System.out.println(count +" "+ key);
            count=0;
        }
    }

    private static String[] getUniqueKeys(String[] keys)
    {
        String[] uniqueKeys = new String[keys.length];

        uniqueKeys[0] = keys[0];
        int uniqueKeyIndex = 1;
        boolean keyAlreadyExists = false;

        for(int i=1; i<keys.length ; i++)
        {
            for(int j=0; j<=uniqueKeyIndex; j++)
            {
                if(keys[i].equals(uniqueKeys[j]))
                {
                    keyAlreadyExists = true;
                }
            }           

            if(!keyAlreadyExists)
            {
                uniqueKeys[uniqueKeyIndex] = keys[i];
                uniqueKeyIndex++;               
            }
            keyAlreadyExists = false;
        }       
        return uniqueKeys;
    }
}

我不知道如何让输出也包含每个单词的行号。谢谢你的帮助。顺便说一句,我正在使用Java。

如果您将单词及其出现数据建模为对象并使用 java 集合框架,这会 简单得多

对于一个词,您想知道它出现的频率,以及它出现的位置,因此 WordData class 可能如下所示:

public class WordData {
    public String theWord;
    public List<Integer> appearsWhere = new ArrayList<>();
}

请注意,我没有计算出现次数,因为那等于它出现的次数,即appearsWhere.size()。我也省略了 getters/setters.

现在为了跟踪多个单词,你想要一个地图,单词是关键,WordData 是值Map<String, WordData> wordMap = new HashMap<>()。要定位单词数据,只需使用 wordMap.get(String),如果 return null 单词还不存在,则创建它并将其放入地图中。否则只需将它出现的位置添加到 WordData。

所以整个程序是这样运行的:

for each word {
    word = word.toLowerCase();
    WordData wd = wordMap.get(word);
    if (wd == null) {
        wd = new WordData();
        wd.theWord = word;
        wordMap.put(word, wd);
    }
    wd.appearsWhere.add(currentPlace);
}

要输出所有数据,只需遍历地图值并在 appearsWhere:

的嵌套循环中
for (WordData wd : wordMap) {
    // output count and word
    for (Integer where : wd.appearsWhere) {
        // output where
    }
}

使用 Map 和 List,您不必担心管理数据存储的细节,这些 classes 会处理这个问题。你可以专注于非常简单的逻辑。

 import java.util.HashMap;
 import java.util.Scanner;
 import java.util.Set;

 public class Countcharacters {

/**
 * @param args
 */
static HashMap<String, Integer> countcharact=new HashMap<>();
static HashMap<String, String> linenumbertrack=new HashMap<>();
static int count=1;
static void countwords(String line){
    //System.out.println(line);
    String[] input=line.split("\s");
    int j=0;
    String linenumber="";
    for(int i=0;i<input.length;i++){
        //System.out.println(input[i]);
        if(countcharact.containsKey(input[i])==true){
            j=countcharact.get(input[i]);
            linenumber=linenumbertrack.get(input[i]);
            countcharact.put(input[i],j+1);
            linenumbertrack.put(input[i],linenumber+" "+count);

        }
        else{
            countcharact.put(input[i], 1);
            linenumbertrack.put(input[i],count+" " );
        }

    }
    count++;


}
public static void main(String[] args) {
    // TODO Auto-generated method stub
   String inp="its am here in 1st line\ni am here in 2nd line";
   String[] line=inp.split("\n");
   for(int i=0;i<line.length;i++){
       Countcharacters.countwords(line[i]);
   }
    Set<String> s=countcharact.keySet();
    for(String c:s){
        System.out.println(countcharact.get(c)+" "+c+" "+linenumbertrack.get(c));
    }



}

}

想法是使用 2 个 hashmap。一个存储单词和出现次数,另一个存储单词和它出现的行号。将这两个 hashmap 结合起来以获得所需的输出。

以上程序的输出:

1 第二个 2

2 点 1 2

1 第 1

2 第 1 行 2

2 这里 1 2

1 其 1

2合1 2

1 我 2