如何读取 pyspark 中的时间戳 csv 文件?
How to read timestamp csv file in pyspark?
我有带时间戳的 csv 文件。我必须使用 pyspark 读取文件。但我们不知道时间戳。
请帮我看看怎么读?
示例:
filename - projectno_without_data_20211030.csv
我必须在不知道时间戳的情况下以这种格式阅读 - projectno_without_data_*.csv
我正在使用下面的代码-
df_read_file = sqlContext.read.format('com.databricks.spark.csv').option("delimiter", '|').options(header='true',quote='', escape='\"', inferSchema='false').load('/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv')
错误-
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/readwriter.py", line 178, in load
return self._df(self._jreader.load(path))
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Path does not exist: file:/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv;
df_read_file = spark.read.format("com.databricks.spark.csv")
.option("delimiter", '|').options(header="true")
.load("/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*")
你能试试这个吗?
我有带时间戳的 csv 文件。我必须使用 pyspark 读取文件。但我们不知道时间戳。 请帮我看看怎么读?
示例:
filename - projectno_without_data_20211030.csv
我必须在不知道时间戳的情况下以这种格式阅读 - projectno_without_data_*.csv
我正在使用下面的代码-
df_read_file = sqlContext.read.format('com.databricks.spark.csv').option("delimiter", '|').options(header='true',quote='', escape='\"', inferSchema='false').load('/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv')
错误-
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/readwriter.py", line 178, in load
return self._df(self._jreader.load(path))
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Path does not exist: file:/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*.csv;
df_read_file = spark.read.format("com.databricks.spark.csv")
.option("delimiter", '|').options(header="true")
.load("/app/HTA/SrcFiles/inbound/metadata/projectno_without_data_*")
你能试试这个吗?