过于复杂的目录迭代结构破坏了程序的连续性和可理解性

overly complicated directory iteration structure devastating program continuity and comprehensibility

我正在尝试从一个目录中读取许多文件,该目录具有 /train 形式的底层子结构,其下分别有 '/atheism/politics/science & /sports 每个包含许多文件。我

我需要读入所有文件中的所有单词以创建全局 "dictionary",每个文件中的每个单词都代表一次(此时我不太担心词干提取或任何那些花哨的东西!)。

问题是,每当我试图以一种清晰的方式思考我必须做的事情时,我正在使用的这种复杂的迭代结构让我感到困惑。我该如何简化和制服这头笨重的野兽!

public class FileDictCreateur 
{
    static String PATH = "/home/Workbench/SUTD/ISTD_50.570/assignments/data/train";

    //the global list of all words across all articles
    static Set<String> GLOBO_DICT = new HashSet<String>();

    public static void main(String[] args) throws IOException 
    {
        //each of the diferent categories
        String[] categories = { "/atheism", "/politics", "/science", "/sports"};

        //cycle through all categories once to populate the global dict
        for(int cycle = 0; cycle <= 3; cycle++)
        {
            String general_data_partition = PATH + categories[cycle];

            File directory = new File( general_data_partition );
            iterateDirectory( directory );
        }
    }

    private static void iterateDirectory(File directory) throws IOException 
    {
        for (File file : directory.listFiles()) 
        {
            if (file.isDirectory()) 
            {
                iterateDirectory(directory);
            }     
            else 
            {
                System.out.println(file);

                String line; 
                BufferedReader br = new BufferedReader(new FileReader( file ));

                while ((line = br.readLine()) != null) 
                {
                    String[] words = line.split(" ");//those are your words

                    //here is where I will populate that 
                    //globo dict

                }
            }
        }
    }

我很确定您在 /home 之后需要一个用户文件夹。另外,您可以使用 File(String, String) constructor and a for-each loop。放在一起,我想你想要的是

static String PATH = "Workbench/SUTD/ISTD_50.570/assignments/data/train";

// the global list of all words across all articles
static Set<String> GLOBO_DICT = new HashSet<String>();

public static void main(String[] args) throws IOException {
    // each of the diferent categories
    String[] categories = { "/atheism", "/politics", "/science", "/sports" };
    File trainpath = new File(System.getProperty("user.home"), PATH);
    // cycle through all categories once to populate the global dict
    for (String cycle : categories) {
        File directory = new File(trainpath, cycle);
        iterateDirectory(directory);
    }
}