为什么 Apache commons csv 解析器将唯一数据附加到第二个结果集中?
Why is Apache commons csv parser appending unique data into 2nd result set?
我在一个目录中有 2 个 CSV 文件(district1.csv、district2.csv),每个文件包含一列 schoolCode
。
当我使用 Apache commons CSV 库读取两个 CSV 文件时,我正在读取 schoolCode
列的不同值并计算结果。
这是我的代码:
public void getDistinctRecordCount() throws IOException {
Set<String> uniqueSchools = new HashSet<>();
int numOfSchools;
String SchoolCode;
//Filter to only read csv files.
File[] files = Directory.listFiles(new FileExtensionFilter());
for (File f : files) {
CSVParser csvParser;
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
reader = Files.newBufferedReader(Paths.get(Directory + "\" + f.getName() ), StandardCharsets.ISO_8859_1);
csvParser = CSVParser.parse(reader, csvFormat);
for (CSVRecord column : csvParser) {
SchoolCode = column.get("School Code");
uniqueSchools.add(SchoolCode);
}
Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
numOfSchools = uniqueSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
Logger.info("-----------------------");
}
}
这是我的输出:
[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8
问题:从 district1.csv 结果中读入的两个值附加到 district2.csv 结果,使我对 district2.csv 的计数减去 2(实际正确值应该是 6 ).它是如何附加的?
如果您不需要所有学校的集合,您可以将 uniqueSchools
移动到循环内或 clear 它:
for (File f : files) {
uniqueSchools.clear();
您还可以在 Map<String, String>
中保存每个文件的学校或为每个文件创建一组,记录计数,然后 addAll 设置为 uniqueSchools
Set<String> currentSchools = new HashSet<>();
..
currentSchools.add(SchoolCode);
Logger.info("The list of Schools for " + f.getName() + " are: " + currentSchools);
numOfSchools = currentSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
uniqueSchools.addAll(currentSchools);
- 考虑小写(驼峰式)变量的首字母,例如将
SchoolCode
更改为 schoolCode
并将 Logger
更改为 logger
我在一个目录中有 2 个 CSV 文件(district1.csv、district2.csv),每个文件包含一列 schoolCode
。
当我使用 Apache commons CSV 库读取两个 CSV 文件时,我正在读取 schoolCode
列的不同值并计算结果。
这是我的代码:
public void getDistinctRecordCount() throws IOException {
Set<String> uniqueSchools = new HashSet<>();
int numOfSchools;
String SchoolCode;
//Filter to only read csv files.
File[] files = Directory.listFiles(new FileExtensionFilter());
for (File f : files) {
CSVParser csvParser;
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
reader = Files.newBufferedReader(Paths.get(Directory + "\" + f.getName() ), StandardCharsets.ISO_8859_1);
csvParser = CSVParser.parse(reader, csvFormat);
for (CSVRecord column : csvParser) {
SchoolCode = column.get("School Code");
uniqueSchools.add(SchoolCode);
}
Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
numOfSchools = uniqueSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
Logger.info("-----------------------");
}
}
这是我的输出:
[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8
问题:从 district1.csv 结果中读入的两个值附加到 district2.csv 结果,使我对 district2.csv 的计数减去 2(实际正确值应该是 6 ).它是如何附加的?
如果您不需要所有学校的集合,您可以将 uniqueSchools
移动到循环内或 clear 它:
for (File f : files) {
uniqueSchools.clear();
您还可以在 Map<String, String>
中保存每个文件的学校或为每个文件创建一组,记录计数,然后 addAll 设置为 uniqueSchools
Set<String> currentSchools = new HashSet<>();
..
currentSchools.add(SchoolCode);
Logger.info("The list of Schools for " + f.getName() + " are: " + currentSchools);
numOfSchools = currentSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
uniqueSchools.addAll(currentSchools);
- 考虑小写(驼峰式)变量的首字母,例如将
SchoolCode
更改为schoolCode
并将Logger
更改为logger