Java

Question

我有一个包含数据点的 CSV 文件

student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, ..., 89
Alex, 2011, Science, 45, 32, 45, ..., 65
Matt, 2009, Art, 34, 56, 75, ..., 43
Matt, 2010, Math, 43, 54, 54, ..., 32

在 Java 中加载 Map 等 CSV 文件的最佳方式是什么？该数据用于查找服务，因此选择了地图数据结构。关键是元组 (student, year) -> 其中 returns 主题 + 分数列表 (SubjectScore.class)。于是思路就是给学生名和年级，得到所有科目和分数。

在定义的类映射中搜索读取 CSV 文件时，我没有找到优雅的解决方案，例如 Map<Tuple, List<SubjectScore>>

class Tuple {
  private String student;
  private int year;
}

class SubjectScore {
  private String subject;
  private int score1;
  private int score2;
  private int score3;
  // more fields here
  private int score100;
}

其他详细信息：CSV 文件很大~2 GB，但本质上是静态的，因此决定加载到内存中。

Answer 1

请在下面找到第一个示例，可以作为起点。我已经删除了您的示例输入数据中的点，并假设一个包含 4 个分数的简化示例。

student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, 89
Alex, 2011, Science, 45, 32, 45, 65
Matt, 2009, Art, 34, 56, 75, 43
Matt, 2010, Math, 43, 54, 54, 32
Alex, 2010, Art, 43, 54, 54, 32

我还假设您已经覆盖了元组中的 equals 和 hashcode 方法 class 并实现了合适的构造函数

class Tuple {
    private String student;
    private int year;

    public Tuple(String student, int year) {
        this.student = student;
        this.year = year;
    }

    @Override
    public int hashCode() {
        int hash = 7;
        hash = 79 * hash + Objects.hashCode(this.student);
        hash = 79 * hash + this.year;
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final Tuple other = (Tuple) obj;
        if (this.year != other.year) {
            return false;
        }
        return Objects.equals(this.student, other.student);
    }   

    @Override
    public String toString() {
        return "Tuple{" + "student=" + student + ", year=" + year + '}';
    }
}

和具有合适构造函数的 SubjectScore class

class SubjectScore {

    private String subject;
    private int score1;
    private int score2;
    private int score3;
    // more fields here
    private int score4;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.score1 = Integer.parseInt(data[1].trim());
        this.score2 = Integer.parseInt(data[2].trim());
        this.score3 = Integer.parseInt(data[3].trim());
        this.score4 = Integer.parseInt(data[4].trim());
    }        
}

然后你可以创建你想要的地图如下：

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.AbstractMap;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Objects;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Example {

    public static void main(String[] args)  {
        Map<Tuple, List<SubjectScore>> map = new HashMap<>();
        try (Stream<String> content = Files.lines(Paths.get("path to your csv file"))) {
            map = content.skip(1).map(line -> lineToEntry(line)) //skip header and map each line to a map entry
                    .collect(Collectors.groupingBy(
                            Map.Entry::getKey, 
                            Collectors.mapping(Map.Entry::getValue, Collectors.toList()))
                    );
        } catch (IOException ex) {
            ex.printStackTrace();
        }

        map.forEach((k,v) -> {System.out.println(k + " : " + v);});
    }

    static Entry<Tuple, SubjectScore> lineToEntry(String line) {
        //split each line at the first and second comma producing an array with 3 columns
        // first column with the name and second with year to create a tuple object
        // evrything after the second comma as one column to create a SubjectScore object
        String[] data = line.split(",", 3);
        Tuple t = new Tuple(data[0].trim(), Integer.parseInt(data[1].trim()));
        SubjectScore s = new SubjectScore(data[2]);
        return new AbstractMap.SimpleEntry<>(t, s);
    }
}

我不知道您是否真的需要 SubjectScore class 中每个分数的单独字段。如果我是你，我更喜欢整数列表。为此，只需将 class 更改为 :

class SubjectScore {

    private String subject;
    private List<Integer> scores;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.scores = Arrays.stream(data, 1, data.length)
                .map(item -> Integer.parseInt(item.trim()))
                .collect(Collectors.toList());
    }
}

Answer 2

I was wondering how to take the same approach but convert it into Map<String, Map<Integer, List<SubjectScore>>>.

我决定添加另一个答案，因为您对数据类型的需求已经改变。假设您仍然有相同的 SubjectScore class

class SubjectScore {

    private String subject;
    private List<Integer> scores;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.scores = Arrays.stream(data, 1, data.length)
                .map(item -> Integer.parseInt(item.trim()))
                .collect(Collectors.toList());
    }
}

使用 if-else 块检查键值对是否已经存在的老式方法：

public static void main(String[] args) throws IOException {

    List<String> allLines = Files.readAllLines(Paths.get("path to your file"));

    Map<String,Map<String, List<SubjectScore>>> mapOldWay = new HashMap<>();

    for(String line : allLines.subList(1, allLines.size())){
        //split each line in 3 parts, i.e  1st column, 2nd column and everything after 3rd column
        String data[] = line.split("\s*,\s*",3);
        if(mapOldWay.containsKey(data[0])){
            if(mapOldWay.get(data[0]).containsKey(data[1])){
                mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
            }
            else{
                mapOldWay.get(data[0]).put(data[1], new ArrayList<>());
                mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
            }
        }
        else{
            mapOldWay.put(data[0], new HashMap<>());
            mapOldWay.get(data[0]).put(data[1], new ArrayList<>());
            mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
        }
    }

    printMap(mapOldWay);
}

public static void printMap(Map<String, Map<String, List<SubjectScore>>> map) {
    map.forEach((outerkey,outervalue) -> {
        System.out.println(outerkey);
        outervalue.forEach((innerkey,innervalue)-> {
            System.out.println("\t" + innerkey + " : " + innervalue);
        });
    });
}

相同的逻辑，但使用 java 8 个特征（Map#computeIfAbsent）更短：

public static void main(String[] args) throws IOException {

    List<String> allLines = Files.readAllLines(Paths.get("path to your file"));

    Map<String,Map<String, List<SubjectScore>>> mapJ8Features = new HashMap<>();
    for(String line : allLines.subList(1, allLines.size())){
        String data[] = line.split("\s*,\s*",3);
        mapJ8Features.computeIfAbsent(data[0], k -> new HashMap<>())
                .computeIfAbsent(data[1], k -> new ArrayList<>())
                .add(new SubjectScore(data[2]));
    }
}

另一种使用流和嵌套的方法Collectors#groupingBy

public static void main(String[] args) throws IOException {
    Map<String,Map<String, List<SubjectScore>>> mapStreams = new HashMap<>();        
    try (Stream<String> content = Files.lines(Paths.get("path to your file"))) {
        mapStreams = content.skip(1).map(line -> line.split("\s*,\s*",3))
                .collect(Collectors.groupingBy(splited -> splited[0],
                         Collectors.groupingBy(splited -> splited[1], 
                         Collectors.mapping(splited -> new SubjectScore(splited[2]),Collectors.toList()))));
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

注意：我现在才意识到您想将年份表示为整数。我把它留作字符串。如果您想更改它，只需将 data[1] or splited[1] 替换为 Integer.parseInt(data[1] or splited[1])

Java - 如何将键和值作为 POJO 加载到 Map 数据结构中的 CSV - Map<ClassA, ClassB>

Java - How to load CSV in Map data structure with key and values as POJO - Map<ClassA, ClassB>

csv

pojo

jackson