Scala:将字符串数组解析为大小写 class

Scala: Parsing Array of String to a case class

我创建了一个这样的案例class:

def case_class(): Unit = {
   case class StockPrice(quarter : Byte,
                      stock : String,
                      date : String,
                      open : Double,
                      high : Double,
                      low : Double,
                      close : Double,
                      volume : Double,
                      percent_change_price : Double,
                      percent_change_volume_over_last_wk : Double,
                      previous_weeks_volume : Double,
                      next_weeks_open : Double,
                      next_weeks_close : Double,
                      percent_change_next_weeks_price : Double,
                      days_to_next_dividend : Double,
                      percent_return_next_dividend : Double
                     )

我有数千行像这样的字符串数组:

1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704

1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852

1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994

1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989

如何将数组中的数据解析为那种情况 class? 感谢您的帮助!

你可以按照下面的步骤进行(我用的是简化的例子)

鉴于你的情况class和数据(行)

// Your case-class
case class MyCaseClass(
  fieldByte: Byte,
  fieldString: String,
  fieldDouble: Double
)

// input data
val lines: List[String] = List(
  "1,AA,.1",
  "2,BB,.2",
  "3,CC,.3"
)

注意:您可以read lines from a text file作为

val lines = Source.fromFile("my_file.txt").getLines.toList

您可以使用一些实用方法进行映射(清理和解析)

// remove '$' symbols from string
def removeDollars(line: String): String = line.replaceAll("\$", "")

// split string into tokens and
// convert into MyCaseClass object
def parseLine(line: String): MyCaseClass = {
  val tokens: Seq[String] = line.split(",")
  MyCaseClass(
    fieldByte = tokens(0).toByte,
    fieldString = tokens(1),
    fieldDouble = tokens(2).toDouble
  )
}

然后使用它们将字符串转换为大小写-class对象

// conversion
val myCaseClassObjects: Seq[MyCaseClass] = lines.map(removeDollars).map(parseLine)

作为一种更高级(和通用)的方法,您可以生成映射(解析)函数,用于将标记转换为字段你的案例-class使用类似reflection的东西,如

这是一种方法。我建议将您所做的一切拆分成许多小的、易于管理的函数,否则如果一切都开始抛出异常,您将迷失方向,试图找出哪里出了问题。数据设置:

val array = Array("1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704",
  "1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852",
  "1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994",
  "1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989")

case class StockPrice(quarter: Byte, stock: String, date: String, open: Double,
  high: Double, low: Double, close: Double, volume: Double, percent_change_price: Double,
  percent_change_volume_over_last_wk: Double, previous_weeks_volume: Double,
  next_weeks_open: Double, next_weeks_close: Double, percent_change_next_weeks_price: Double,
  days_to_next_dividend: Double, percent_return_next_dividend: Double
)

函数将 Array[String] 转换为 Array[List[String]] 并处理任何空字段(我在这里假设您希望空字段为 0。根据需要更改此设置) :

def splitArray(arr: Array[String]): Array[List[String]] = {
  arr.map(
    _.replaceAll("\$", "")         // Remove $
      .split(",")                   // Split by ,
      .map {
        case x if x.isEmpty => "0"  // If empty
        case y => y                 // If not empty
      }
      .toList
  )
}

List[String] 转换为 StockPrice 的函数。请注意,如果 List 的长度不完全是 16 个项目,这将失败。我会让你处理任何事情。此外,名称非常非描述性,因此您也可以更改它。如果您的数据未映射到相关的 .toDoubletoByte 或其他任何内容,它也会失败 - 您也可以自己处理:

def toStockPrice: List[String] => StockPrice = {
  case a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil =>
    StockPrice(a.toByte, b, c, d.toDouble, e.toDouble, f.toDouble, g.toDouble, h.toDouble, i.toDouble, j.toDouble,
      k.toDouble, l.toDouble, m.toDouble, n.toDouble, o.toDouble, p.toDouble)
}

一个很好的功能,可以将所有这些结合在一起:

def makeCaseClass(arr: Array[String]): Seq[StockPrice] = {
  val splitArr: Array[List[String]] = splitArray(arr)
  splitArr.map(toStockPrice)
}

输出:

println(makeCaseClass(array))

//ArraySeq(
// StockPrice(1,AA,1/7/2011,15.82,16.72,15.78,16.42,2.39655616E8,3.79267,0.0,0.0,16.71,15.97,-4.42849,26.0,0.182704), 
// StockPrice(1,AA,1/14/2011,16.71,16.71,15.64,15.97,2.42963398E8,-4.42849,1.380223028,2.39655616E8,16.19,15.79,-2.47066,19.0,0.187852), 
// StockPrice(1,AA,1/21/2011,16.19,16.38,15.6,15.79,1.38428495E8,-2.47066,-43.02495926,2.42963398E8,15.87,16.13,1.63831,12.0,0.189994), 
// StockPrice(1,AA,1/28/2011,15.87,16.63,15.82,16.13,1.51379173E8,1.63831,9.355500109,1.38428495E8,16.18,17.14,5.93325,5.0,0.185989)
//)

编辑:

解释 a :: b :: c ..... 位 - 如果您知道列表的大小,这是一种为列表或序列中的项目分配名称的方法。

val ls = List(1, 2, 3)
val a :: b :: c :: Nil = List(1, 2, 3)
println(a == ls.head) // true
println(b == ls(1)) // true
println(c == ls(2)) // true

请注意 Nil 很重要,因为它表示列表的最后一个元素为 Nil。没有它,c 将等于 List(3),因为任何列表的其余部分都分配给您定义中的最后一个值。

您可以像我一样在模式匹配中使用它,以便对结果做一些事情:

val ls = List(1, "b", true)
ls match {
  case a :: b :: c if c == true => println("this will not be printed")
  case a :: b :: c :: Nil if c == true => println(s"this will get printed because c == $c")
} // not exhaustive but you get the point

如果您知道列表中的每个元素是什么,也可以使用它,如下所示:

val personCharacteristics = List("James", 26, "blue", 6, 85.4, "brown")
val name :: age :: eyeColour :: otherCharacteristics = personCharacteristics
println(s"Name: $name; Age: $age; Eye colour: $eyeColour")
// Name: James; Age: 26; Eye colour: blue

显然,这些示例非常琐碎,并不完全是您作为专业 Scala 开发人员所看到的(至少我不这么认为),但这是一件需要注意的事情,因为我仍然在使用它 :: 语法有时会起作用。