Scala:将字符串数组解析为大小写 class
Scala: Parsing Array of String to a case class
我创建了一个这样的案例class:
def case_class(): Unit = {
case class StockPrice(quarter : Byte,
stock : String,
date : String,
open : Double,
high : Double,
low : Double,
close : Double,
volume : Double,
percent_change_price : Double,
percent_change_volume_over_last_wk : Double,
previous_weeks_volume : Double,
next_weeks_open : Double,
next_weeks_close : Double,
percent_change_next_weeks_price : Double,
days_to_next_dividend : Double,
percent_return_next_dividend : Double
)
我有数千行像这样的字符串数组:
1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704
1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852
1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994
1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989
如何将数组中的数据解析为那种情况 class?
感谢您的帮助!
你可以按照下面的步骤进行(我用的是简化的例子)
鉴于你的情况class和数据(行)
// Your case-class
case class MyCaseClass(
fieldByte: Byte,
fieldString: String,
fieldDouble: Double
)
// input data
val lines: List[String] = List(
"1,AA,.1",
"2,BB,.2",
"3,CC,.3"
)
注意:您可以read lines from a text file作为
val lines = Source.fromFile("my_file.txt").getLines.toList
您可以使用一些实用方法进行映射(清理和解析)
// remove '$' symbols from string
def removeDollars(line: String): String = line.replaceAll("\$", "")
// split string into tokens and
// convert into MyCaseClass object
def parseLine(line: String): MyCaseClass = {
val tokens: Seq[String] = line.split(",")
MyCaseClass(
fieldByte = tokens(0).toByte,
fieldString = tokens(1),
fieldDouble = tokens(2).toDouble
)
}
然后使用它们将字符串转换为大小写-class对象
// conversion
val myCaseClassObjects: Seq[MyCaseClass] = lines.map(removeDollars).map(parseLine)
作为一种更高级(和通用)的方法,您可以生成映射(解析)函数,用于将标记转换为字段你的案例-class使用类似reflection
的东西,如
这是一种方法。我建议将您所做的一切拆分成许多小的、易于管理的函数,否则如果一切都开始抛出异常,您将迷失方向,试图找出哪里出了问题。数据设置:
val array = Array("1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704",
"1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852",
"1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994",
"1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989")
case class StockPrice(quarter: Byte, stock: String, date: String, open: Double,
high: Double, low: Double, close: Double, volume: Double, percent_change_price: Double,
percent_change_volume_over_last_wk: Double, previous_weeks_volume: Double,
next_weeks_open: Double, next_weeks_close: Double, percent_change_next_weeks_price: Double,
days_to_next_dividend: Double, percent_return_next_dividend: Double
)
函数将 Array[String]
转换为 Array[List[String]]
并处理任何空字段(我在这里假设您希望空字段为 0
。根据需要更改此设置) :
def splitArray(arr: Array[String]): Array[List[String]] = {
arr.map(
_.replaceAll("\$", "") // Remove $
.split(",") // Split by ,
.map {
case x if x.isEmpty => "0" // If empty
case y => y // If not empty
}
.toList
)
}
将 List[String]
转换为 StockPrice
的函数。请注意,如果 List 的长度不完全是 16 个项目,这将失败。我会让你处理任何事情。此外,名称非常非描述性,因此您也可以更改它。如果您的数据未映射到相关的 .toDouble
或 toByte
或其他任何内容,它也会失败 - 您也可以自己处理:
def toStockPrice: List[String] => StockPrice = {
case a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil =>
StockPrice(a.toByte, b, c, d.toDouble, e.toDouble, f.toDouble, g.toDouble, h.toDouble, i.toDouble, j.toDouble,
k.toDouble, l.toDouble, m.toDouble, n.toDouble, o.toDouble, p.toDouble)
}
一个很好的功能,可以将所有这些结合在一起:
def makeCaseClass(arr: Array[String]): Seq[StockPrice] = {
val splitArr: Array[List[String]] = splitArray(arr)
splitArr.map(toStockPrice)
}
输出:
println(makeCaseClass(array))
//ArraySeq(
// StockPrice(1,AA,1/7/2011,15.82,16.72,15.78,16.42,2.39655616E8,3.79267,0.0,0.0,16.71,15.97,-4.42849,26.0,0.182704),
// StockPrice(1,AA,1/14/2011,16.71,16.71,15.64,15.97,2.42963398E8,-4.42849,1.380223028,2.39655616E8,16.19,15.79,-2.47066,19.0,0.187852),
// StockPrice(1,AA,1/21/2011,16.19,16.38,15.6,15.79,1.38428495E8,-2.47066,-43.02495926,2.42963398E8,15.87,16.13,1.63831,12.0,0.189994),
// StockPrice(1,AA,1/28/2011,15.87,16.63,15.82,16.13,1.51379173E8,1.63831,9.355500109,1.38428495E8,16.18,17.14,5.93325,5.0,0.185989)
//)
编辑:
解释 a :: b :: c .....
位 - 如果您知道列表的大小,这是一种为列表或序列中的项目分配名称的方法。
val ls = List(1, 2, 3)
val a :: b :: c :: Nil = List(1, 2, 3)
println(a == ls.head) // true
println(b == ls(1)) // true
println(c == ls(2)) // true
请注意 Nil
很重要,因为它表示列表的最后一个元素为 Nil。没有它,c
将等于 List(3)
,因为任何列表的其余部分都分配给您定义中的最后一个值。
您可以像我一样在模式匹配中使用它,以便对结果做一些事情:
val ls = List(1, "b", true)
ls match {
case a :: b :: c if c == true => println("this will not be printed")
case a :: b :: c :: Nil if c == true => println(s"this will get printed because c == $c")
} // not exhaustive but you get the point
如果您知道列表中的每个元素是什么,也可以使用它,如下所示:
val personCharacteristics = List("James", 26, "blue", 6, 85.4, "brown")
val name :: age :: eyeColour :: otherCharacteristics = personCharacteristics
println(s"Name: $name; Age: $age; Eye colour: $eyeColour")
// Name: James; Age: 26; Eye colour: blue
显然,这些示例非常琐碎,并不完全是您作为专业 Scala 开发人员所看到的(至少我不这么认为),但这是一件需要注意的事情,因为我仍然在使用它 ::
语法有时会起作用。
我创建了一个这样的案例class:
def case_class(): Unit = {
case class StockPrice(quarter : Byte,
stock : String,
date : String,
open : Double,
high : Double,
low : Double,
close : Double,
volume : Double,
percent_change_price : Double,
percent_change_volume_over_last_wk : Double,
previous_weeks_volume : Double,
next_weeks_open : Double,
next_weeks_close : Double,
percent_change_next_weeks_price : Double,
days_to_next_dividend : Double,
percent_return_next_dividend : Double
)
我有数千行像这样的字符串数组:
1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704
1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852
1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994
1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989
如何将数组中的数据解析为那种情况 class? 感谢您的帮助!
你可以按照下面的步骤进行(我用的是简化的例子)
鉴于你的情况class和数据(行)
// Your case-class
case class MyCaseClass(
fieldByte: Byte,
fieldString: String,
fieldDouble: Double
)
// input data
val lines: List[String] = List(
"1,AA,.1",
"2,BB,.2",
"3,CC,.3"
)
注意:您可以read lines from a text file作为
val lines = Source.fromFile("my_file.txt").getLines.toList
您可以使用一些实用方法进行映射(清理和解析)
// remove '$' symbols from string
def removeDollars(line: String): String = line.replaceAll("\$", "")
// split string into tokens and
// convert into MyCaseClass object
def parseLine(line: String): MyCaseClass = {
val tokens: Seq[String] = line.split(",")
MyCaseClass(
fieldByte = tokens(0).toByte,
fieldString = tokens(1),
fieldDouble = tokens(2).toDouble
)
}
然后使用它们将字符串转换为大小写-class对象
// conversion
val myCaseClassObjects: Seq[MyCaseClass] = lines.map(removeDollars).map(parseLine)
作为一种更高级(和通用)的方法,您可以生成映射(解析)函数,用于将标记转换为字段你的案例-class使用类似reflection
的东西,如
这是一种方法。我建议将您所做的一切拆分成许多小的、易于管理的函数,否则如果一切都开始抛出异常,您将迷失方向,试图找出哪里出了问题。数据设置:
val array = Array("1,AA,1/7/2011,.82,.72,.78,.42,239655616,3.79267,,,.71,.97,-4.42849,26,0.182704",
"1,AA,1/14/2011,.71,.71,.64,.97,242963398,-4.42849,1.380223028,239655616,.19,.79,-2.47066,19,0.187852",
"1,AA,1/21/2011,.19,.38,.60,.79,138428495,-2.47066,-43.02495926,242963398,.87,.13,1.63831,12,0.189994",
"1,AA,1/28/2011,.87,.63,.82,.13,151379173,1.63831,9.355500109,138428495,.18,.14,5.93325,5,0.185989")
case class StockPrice(quarter: Byte, stock: String, date: String, open: Double,
high: Double, low: Double, close: Double, volume: Double, percent_change_price: Double,
percent_change_volume_over_last_wk: Double, previous_weeks_volume: Double,
next_weeks_open: Double, next_weeks_close: Double, percent_change_next_weeks_price: Double,
days_to_next_dividend: Double, percent_return_next_dividend: Double
)
函数将 Array[String]
转换为 Array[List[String]]
并处理任何空字段(我在这里假设您希望空字段为 0
。根据需要更改此设置) :
def splitArray(arr: Array[String]): Array[List[String]] = {
arr.map(
_.replaceAll("\$", "") // Remove $
.split(",") // Split by ,
.map {
case x if x.isEmpty => "0" // If empty
case y => y // If not empty
}
.toList
)
}
将 List[String]
转换为 StockPrice
的函数。请注意,如果 List 的长度不完全是 16 个项目,这将失败。我会让你处理任何事情。此外,名称非常非描述性,因此您也可以更改它。如果您的数据未映射到相关的 .toDouble
或 toByte
或其他任何内容,它也会失败 - 您也可以自己处理:
def toStockPrice: List[String] => StockPrice = {
case a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil =>
StockPrice(a.toByte, b, c, d.toDouble, e.toDouble, f.toDouble, g.toDouble, h.toDouble, i.toDouble, j.toDouble,
k.toDouble, l.toDouble, m.toDouble, n.toDouble, o.toDouble, p.toDouble)
}
一个很好的功能,可以将所有这些结合在一起:
def makeCaseClass(arr: Array[String]): Seq[StockPrice] = {
val splitArr: Array[List[String]] = splitArray(arr)
splitArr.map(toStockPrice)
}
输出:
println(makeCaseClass(array))
//ArraySeq(
// StockPrice(1,AA,1/7/2011,15.82,16.72,15.78,16.42,2.39655616E8,3.79267,0.0,0.0,16.71,15.97,-4.42849,26.0,0.182704),
// StockPrice(1,AA,1/14/2011,16.71,16.71,15.64,15.97,2.42963398E8,-4.42849,1.380223028,2.39655616E8,16.19,15.79,-2.47066,19.0,0.187852),
// StockPrice(1,AA,1/21/2011,16.19,16.38,15.6,15.79,1.38428495E8,-2.47066,-43.02495926,2.42963398E8,15.87,16.13,1.63831,12.0,0.189994),
// StockPrice(1,AA,1/28/2011,15.87,16.63,15.82,16.13,1.51379173E8,1.63831,9.355500109,1.38428495E8,16.18,17.14,5.93325,5.0,0.185989)
//)
编辑:
解释 a :: b :: c .....
位 - 如果您知道列表的大小,这是一种为列表或序列中的项目分配名称的方法。
val ls = List(1, 2, 3)
val a :: b :: c :: Nil = List(1, 2, 3)
println(a == ls.head) // true
println(b == ls(1)) // true
println(c == ls(2)) // true
请注意 Nil
很重要,因为它表示列表的最后一个元素为 Nil。没有它,c
将等于 List(3)
,因为任何列表的其余部分都分配给您定义中的最后一个值。
您可以像我一样在模式匹配中使用它,以便对结果做一些事情:
val ls = List(1, "b", true)
ls match {
case a :: b :: c if c == true => println("this will not be printed")
case a :: b :: c :: Nil if c == true => println(s"this will get printed because c == $c")
} // not exhaustive but you get the point
如果您知道列表中的每个元素是什么,也可以使用它,如下所示:
val personCharacteristics = List("James", 26, "blue", 6, 85.4, "brown")
val name :: age :: eyeColour :: otherCharacteristics = personCharacteristics
println(s"Name: $name; Age: $age; Eye colour: $eyeColour")
// Name: James; Age: 26; Eye colour: blue
显然,这些示例非常琐碎,并不完全是您作为专业 Scala 开发人员所看到的(至少我不这么认为),但这是一件需要注意的事情,因为我仍然在使用它 ::
语法有时会起作用。