加载 CSV - 无法从数据帧传递文件路径
Load CSVs - unable to pass file paths from dataframe
下面的代码工作正常:
val Path = Seq (
"dbfs:/mnt/testdata/2019/02/Calls2019-02-03.tsv",
"dbfs:/mnt/testdata/2019/02/Calls2019-02-02.tsv"
)
val Calls = spark.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "\t")
.schema(schema)
.load(Path: _*)
但我想从数据框中获取路径,但下面的代码不起作用。
val tsvPath =
Seq(
FinalFileList
.select($"Path")
.filter($"FileDate">MaxStartTime)
.collect.mkString(",")
.replaceAll("[\[\]]","")
)
val Calls = spark.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "\t")
.schema(schema)
.load(tsvPath: _*)
错误:
org.apache.spark.sql.AnalysisException: Path does not exist: dbfs:/mnt/testdata/2019/02/Calls2019-02-03.tsv,dbfs:/mnt/testdata/2019/02/Calls2019-02-02.tsv;
看起来它采用的路径是 "/mnt/file1.tsv, /mnt/file2.tsv"
而不是 "/mnt/file1.tsv","/mnt/file2.tsv"
Looks like it is taking the path as "/mnt/file1.tsv, /mnt/file2.tsv" instead of "/mnt/file1.tsv","/mnt/file2.tsv"
我怀疑你的问题出在这里:
.collect.mkString(",")
.replaceAll("[\[\]]","")
.mkString
将字符串合并为一个。这里一种可能的解决方案是替换后再次拆分:
.collect.mkString(",")
.replaceAll("[\[\]]","")
.split(",")
另一种方法是只替换每个元素而不是组合成一个字符串:
.collect.foreach(_.replaceAll("[\[\]]",""))
哪个更适合你。
下面的代码工作正常:
val Path = Seq (
"dbfs:/mnt/testdata/2019/02/Calls2019-02-03.tsv",
"dbfs:/mnt/testdata/2019/02/Calls2019-02-02.tsv"
)
val Calls = spark.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "\t")
.schema(schema)
.load(Path: _*)
但我想从数据框中获取路径,但下面的代码不起作用。
val tsvPath =
Seq(
FinalFileList
.select($"Path")
.filter($"FileDate">MaxStartTime)
.collect.mkString(",")
.replaceAll("[\[\]]","")
)
val Calls = spark.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "\t")
.schema(schema)
.load(tsvPath: _*)
错误:
org.apache.spark.sql.AnalysisException: Path does not exist: dbfs:/mnt/testdata/2019/02/Calls2019-02-03.tsv,dbfs:/mnt/testdata/2019/02/Calls2019-02-02.tsv;
看起来它采用的路径是 "/mnt/file1.tsv, /mnt/file2.tsv"
而不是 "/mnt/file1.tsv","/mnt/file2.tsv"
Looks like it is taking the path as "/mnt/file1.tsv, /mnt/file2.tsv" instead of "/mnt/file1.tsv","/mnt/file2.tsv"
我怀疑你的问题出在这里:
.collect.mkString(",")
.replaceAll("[\[\]]","")
.mkString
将字符串合并为一个。这里一种可能的解决方案是替换后再次拆分:
.collect.mkString(",")
.replaceAll("[\[\]]","")
.split(",")
另一种方法是只替换每个元素而不是组合成一个字符串:
.collect.foreach(_.replaceAll("[\[\]]",""))
哪个更适合你。