嵌套 Scala 大小写 类 to/from CSV

Nested Scala case classes to/from CSV

writing/reading Scala case 类 to/from CSV 文件有很多不错的库。我正在寻找超出此范围的东西,它可以处理 嵌套 个案例 类。例如,这里一个 Match 有两个 Players:

case class Player(name: String, ranking: Int)
case class Match(place: String, winner: Player, loser: Player)

val matches = List(
  Match("London", Player("Jane",7), Player("Fred",23)),
  Match("Rome", Player("Marco",19), Player("Giulia",3)),
  Match("Paris", Player("Isabelle",2), Player("Julien",5))
)

我想毫不费力地(没有样板文件!)write/read matches to/from 这个 CSV:

place,winner.name,winner.ranking,loser.name,loser.ranking
London,Jane,7,Fred,23
Rome,Marco,19,Giulia,3
Paris,Isabelle,2,Julien,5

请注意使用点“.”的自动 header 行。形成嵌套字段的列名,例如winner.ranking。如果有人能展示一种简单的方法来做到这一点(比如,使用反射或 Shapeless),我会很高兴。

[动力。在数据分析期间,使用平面 CSV 进行排序、过滤等很方便,即使嵌套 case 类 也是如此。如果你能从这些文件中加载嵌套案例 类 就好了。]

由于 case-class 是 Product,因此获取各个字段的值相对容易。获取 fields/columns 的名称确实需要使用 Java 反射。 以下函数采用 case-class 个实例列表和 returns 个行列表,每个都是一个字符串列表。它使用递归来获取 child case-class 个实例的值和 headers。

def toCsv(p: List[Product]): List[List[String]] = {
  def header(c: Class[_], prefix: String = ""): List[String] = {
    c.getDeclaredFields.toList.flatMap { field =>
      val name = prefix + field.getName
      if (classOf[Product].isAssignableFrom(field.getType)) header(field.getType, name + ".")
      else List(name)
    }
  }

  def flatten(p: Product): List[String] =
    p.productIterator.flatMap {
      case p: Product => flatten(p)
      case v: Any => List(v.toString)
    }.toList

  header(classOf[Match]) :: p.map(flatten)
}

但是,从 CSV 构造 case-classes 复杂得多,需要使用反射来获取各种字段的类型、从 CSV 字符串创建值以及构造 case-class 实例。 为简单起见(不是说代码简单,只是为了不会变得更复杂),我假设 CSV 中列的顺序与文件是由上面的 toCsv(...) 函数生成的一样。 以下函数首先创建 "instructions how to process a single CSV row" 列表(这些说明还用于验证 CSV 中的列 headers 是否与 case-class 属性匹配)。然后使用这些指令一次递归地生成一个 CSV 行。

def fromCsv[T <: Product](csv: List[List[String]])(implicit tag: ClassTag[T]): List[T] = {
  trait Instruction {
    val name: String
    val header = true
  }
  case class BeginCaseClassField(name: String, clazz: Class[_]) extends Instruction {
    override val header = false
  }
  case class EndCaseClassField(name: String) extends Instruction {
    override val header = false
  }
  case class IntField(name: String) extends Instruction
  case class StringField(name: String) extends Instruction
  case class DoubleField(name: String) extends Instruction

  def scan(c: Class[_], prefix: String = ""): List[Instruction] = {
    c.getDeclaredFields.toList.flatMap { field =>
      val name = prefix + field.getName
      val fType = field.getType

      if (fType == classOf[Int]) List(IntField(name))
      else if (fType == classOf[Double]) List(DoubleField(name))
      else if (fType == classOf[String]) List(StringField(name))
      else if (classOf[Product].isAssignableFrom(fType)) BeginCaseClassField(name, fType) :: scan(fType, name + ".")
      else throw new IllegalArgumentException(s"Unsupported field type: $fType")
    } :+ EndCaseClassField(prefix)
  }

  def produce(instructions: List[Instruction], row: List[String], argAccumulator: List[Any]): (List[Instruction], List[String], List[Any]) = instructions match {
    case IntField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toInt)
    case StringField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString)
    case DoubleField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toDouble)
    case BeginCaseClassField(_, clazz) :: tail =>
      val (instructionRemaining, rowRemaining, constructorArgs) = produce(tail, row, List.empty)
      val newCaseClass = clazz.getConstructors.head.newInstance(constructorArgs.map(_.asInstanceOf[AnyRef]): _*)
      produce(instructionRemaining, rowRemaining, argAccumulator :+ newCaseClass)
    case EndCaseClassField(_) :: tail => (tail, row, argAccumulator)
    case Nil if row.isEmpty => (Nil, Nil, argAccumulator)
    case Nil => throw new IllegalArgumentException("Not all values from CSV row were used")
  }

  val instructions = BeginCaseClassField(".", tag.runtimeClass) :: scan(tag.runtimeClass)
  assert(csv.head == instructions.filter(_.header).map(_.name), "CSV header doesn't match target case-class fields")

  csv.drop(1).map(row => produce(instructions, row, List.empty)._3.head.asInstanceOf[T])
}

我已经使用以下方法对此进行了测试:

case class Player(name: String, ranking: Int, price: Double)
case class Match(place: String, winner: Player, loser: Player)

val matches = List(
  Match("London", Player("Jane", 7, 12.5), Player("Fred", 23, 11.1)),
  Match("Rome", Player("Marco", 19, 13.54), Player("Giulia", 3, 41.8)),
  Match("Paris", Player("Isabelle", 2, 31.7), Player("Julien", 5, 16.8))
)
val csv = toCsv(matches)
val matchesFromCsv = fromCsv[Match](csv)

assert(matches == matchesFromCsv)

显然,如果您想将其用于生产,则应对其进行优化和强化...