在 PySpark 中写入数据框的自定义文件名
Custom file name to write dataframe in PySpark
我想写dataframe的记录。记录采用 json 格式。所以我需要用我的自定义文件名而不是 part-0000-cfhbhgh.json.
将内容写入文件
我在 scala 中给出答案,但在 python 中这些也是基本步骤..
import org.apache.hadoop.fs.{FileSystem, Path}
val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))
完整示例:
import spark.implicits._
val df = Seq(
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600516, 19.0)
).toDF("Value", "Country", "Timestamp", "Sum")
df.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.json("data/jsonexample/")
import org.apache.hadoop.fs.{FileSystem, Path}
val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))
结果:
json 内容:
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600516,"Sum":19.0}
我想写dataframe的记录。记录采用 json 格式。所以我需要用我的自定义文件名而不是 part-0000-cfhbhgh.json.
将内容写入文件我在 scala 中给出答案,但在 python 中这些也是基本步骤..
import org.apache.hadoop.fs.{FileSystem, Path}
val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))
完整示例:
import spark.implicits._
val df = Seq(
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600516, 19.0)
).toDF("Value", "Country", "Timestamp", "Sum")
df.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.json("data/jsonexample/")
import org.apache.hadoop.fs.{FileSystem, Path}
val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))
结果:
json 内容:
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600516,"Sum":19.0}