使用 spark ML 转换数据框中的许多功能
Turn many features in a data frame with spark ML
我正在学习本教程 https://mapr.com/blog/churn-prediction-sparkml/
我意识到 csv 结构必须像这样手写:
val schema = StructType(Array(
StructField("state", StringType, true),
StructField("len", IntegerType, true),
StructField("acode", StringType, true),
StructField("intlplan", StringType, true),
StructField("vplan", StringType, true),
StructField("numvmail", DoubleType, true),
StructField("tdmins", DoubleType, true),
StructField("tdcalls", DoubleType, true),
StructField("tdcharge", DoubleType, true),
StructField("temins", DoubleType, true),
StructField("tecalls", DoubleType, true),
StructField("techarge", DoubleType, true),
StructField("tnmins", DoubleType, true),
StructField("tncalls", DoubleType, true),
StructField("tncharge", DoubleType, true),
StructField("timins", DoubleType, true),
StructField("ticalls", DoubleType, true),
StructField("ticharge", DoubleType, true),
StructField("numcs", DoubleType, true),
StructField("churn", StringType, true)
但是我有一个包含 335 个特征的数据集,所以我不想把它们都写出来...有没有一种简单的方法来检索它们并相应地定义模式?
我在这里找到了解决方案:https://dzone.com/articles/using-apache-spark-dataframes-for-processing-of-ta
比我想象的要容易
我正在学习本教程 https://mapr.com/blog/churn-prediction-sparkml/ 我意识到 csv 结构必须像这样手写:
val schema = StructType(Array(
StructField("state", StringType, true),
StructField("len", IntegerType, true),
StructField("acode", StringType, true),
StructField("intlplan", StringType, true),
StructField("vplan", StringType, true),
StructField("numvmail", DoubleType, true),
StructField("tdmins", DoubleType, true),
StructField("tdcalls", DoubleType, true),
StructField("tdcharge", DoubleType, true),
StructField("temins", DoubleType, true),
StructField("tecalls", DoubleType, true),
StructField("techarge", DoubleType, true),
StructField("tnmins", DoubleType, true),
StructField("tncalls", DoubleType, true),
StructField("tncharge", DoubleType, true),
StructField("timins", DoubleType, true),
StructField("ticalls", DoubleType, true),
StructField("ticharge", DoubleType, true),
StructField("numcs", DoubleType, true),
StructField("churn", StringType, true)
但是我有一个包含 335 个特征的数据集,所以我不想把它们都写出来...有没有一种简单的方法来检索它们并相应地定义模式?
我在这里找到了解决方案:https://dzone.com/articles/using-apache-spark-dataframes-for-processing-of-ta 比我想象的要容易