在读取数据帧中使用'\'

Question

# File location and type
file_location = "/FileStore/tables/FileName.csv"
file_type = "csv"

#CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other files types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

这是从 csv 文件读取数据的通用代码。在此代码中，".option("inferSchema", infer_schema) 的用途是什么？"" 在此代码中将做什么？

Answer 1

在行尾使用反斜杠被视为续行，这意味着反斜杠之后的内容将被视为上一行的续行。在您的情况下，这 5 行被视为一行。

之所以需要""，首先，你放在引号里的都是字符串，因为"header"、"inferShema"等这些函数是语法的一部分，你需要保留他们本来的样子。

这个答案可能对您有更多帮助。

Answer 2

行尾使用反斜杠''表示反斜杠后的代码被认为在同一行。这主要是在代码扩展到单行的长代码中完成的。

inferSchema 用于推断数据框中列的数据类型。如果我们将 inferSchema 设为 true，那么 spark 在加载数据时会读取 dataframe 中的所有数据以推断列的数据类型。

"" 与 .option 函数一起使用。它用于在读取文件时添加不同的参数。可以使用选项函数添加许多参数，例如 header、inferSchema、sep、schema 等

pyspark.sql.DataFrameReader.csv

您可以参考上述link以获得进一步的帮助。

在读取数据帧中使用'\'

Use of '\' in reading dataframe

apache-spark-sql

pyspark

databricks

azure-databricks

aws-databricks