Scala 测试:如何在不进行硬编码的情况下安全且干净地断言冗长的异常消息?
Scala Test: how to assert lenghty exception message securly and clean without hardcoding?
我有以下代码,用于(sha)散列 spark 数据帧中的列:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}
object hashing {
def process(hashFieldNames: List[String])(df: DataFrame) = {
hashFieldNames.foldLeft(df) { case (df, hashField) =>
df.withColumn(hashField, sha2(col(hashField), 256))
}
}
}
现在在一个单独的文件中,我正在使用 AnyWordSpec
测试来测试我的 hashing.process
,如下所示:
"The hashing .process " should {
// some cases here that complete succesfully
"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)
val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
val thrown = intercept[org.apache.spark.sql.AnalysisException] {
val hashedResultDf =
hashing.process(hashFieldNames)(badDf)
}
assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety.
通常,据我了解,人们希望对整个错误消息进行硬编码以确保它确实如我们所料。不过帖子很长,不知道有没有更好的办法。
基本上,我有两个问题:
a.) 只匹配错误消息的开头部分然后再匹配是否被认为是好的做法
跟进正则表达式?我在想这样的事情:thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" + regexpattern \;(.*))
b.) 如果 a.) 被认为是一种 hacky 方法,您对如何正确执行它有什么可行的建议吗?
注意:上面的代码可能会出现小错误,我针对 SO post 进行了修改。但是你应该明白了。
好的,回答我自己的问题。我现在这样解决了:
"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)
val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
//val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
IngestionHashing.process(hashFieldNames)(badDf)
}
thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}
留下这个 post 以寻求潜在的建议/改进。根据https://github.com/databricks/scala-style-guide#intercepting-exceptions解决方案还是不理想。
您不应该断言异常消息(除非它们被显示给用户,或者某些下游依赖于它们)。
如果抛出异常是合同的一部分,那么您应该抛出具有给定错误代码的特定类型之一,并且测试应该对此进行断言。如果不是,那么谁在乎消息说了什么?
我有以下代码,用于(sha)散列 spark 数据帧中的列:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{sha2,lit, col}
object hashing {
def process(hashFieldNames: List[String])(df: DataFrame) = {
hashFieldNames.foldLeft(df) { case (df, hashField) =>
df.withColumn(hashField, sha2(col(hashField), 256))
}
}
}
现在在一个单独的文件中,我正在使用 AnyWordSpec
测试来测试我的 hashing.process
,如下所示:
"The hashing .process " should {
// some cases here that complete succesfully
"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)
val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
val thrown = intercept[org.apache.spark.sql.AnalysisException] {
val hashedResultDf =
hashing.process(hashFieldNames)(badDf)
}
assert (thrown.getMessage === // some lengthy error message that I do not want to copy paste in its entirety.
通常,据我了解,人们希望对整个错误消息进行硬编码以确保它确实如我们所料。不过帖子很长,不知道有没有更好的办法。
基本上,我有两个问题:
a.) 只匹配错误消息的开头部分然后再匹配是否被认为是好的做法
跟进正则表达式?我在想这样的事情:thrown.getMessage === "[cannot resolve sha2(ID, 256) due to data type mismatch: argument 1 requires binary type, however, ID is of int type.;" + regexpattern \;(.*))
b.) 如果 a.) 被认为是一种 hacky 方法,您对如何正确执行它有什么可行的建议吗?
注意:上面的代码可能会出现小错误,我针对 SO post 进行了修改。但是你应该明白了。
好的,回答我自己的问题。我现在这样解决了:
"fail to hash a spark dataframe due to type mismatch " in {
val goodColumns = Seq("language", "usersCount", "ID", "personalData")
val badDataSample =
Seq(
("Java", "20000", 2, "happy"),
("Python", "100000", 3, "happy"),
("Scala", "3000", 1, "jolly")
)
val badDf =
spark.sparkContext.parallelize(badDataSample).toDF(goodColumns: _*)
//val expectedErrorMessageSubstring = "sha2(`ID`, 256)' due to data type mismatch: argument 1 requires binary type".r
val thrownExcepetion = intercept[org.apache.spark.sql.AnalysisException] {
IngestionHashing.process(hashFieldNames)(badDf)
}
thrownExcepetion.getMessage should include regex "type mismatch: argument 1 requires binary type"
}
留下这个 post 以寻求潜在的建议/改进。根据https://github.com/databricks/scala-style-guide#intercepting-exceptions解决方案还是不理想。
您不应该断言异常消息(除非它们被显示给用户,或者某些下游依赖于它们)。 如果抛出异常是合同的一部分,那么您应该抛出具有给定错误代码的特定类型之一,并且测试应该对此进行断言。如果不是,那么谁在乎消息说了什么?