AnalysisException：无法从 place#14 中提取值：需要结构类型但得到了 double

Question

我正在尝试从我的数据框中查找缺失值和空值，但出现异常。我只包含了下面最初的几个模式：

root
|-- created_at: string (nullable = true)
|-- id: long (nullable = true)
|-- id_str: string (nullable = true)
|-- text: string (nullable = true)
|-- display_text_range: string (nullable = true)
|-- source: string (nullable = true)
|-- truncated: boolean (nullable = true)
|-- in_reply_to_status_id: double (nullable = true)
|-- in_reply_to_status_id_str: string (nullable = true)
|-- in_reply_to_user_id: double (nullable = true)
|-- in_reply_to_user_id_str: string (nullable = true)
|-- in_reply_to_screen_name: string (nullable = true)
|-- geo: double (nullable = true)
|-- coordinates: double (nullable = true)
|-- place: double (nullable = true)
|-- contributors: string (nullable = true)

这是抛出异常的代码。我试图在这里找到缺失值和空值。

df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
df_mis.show()

这里是 AnalysisException 的详细信息：

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<ipython-input-20-6ccaacbbcc7f> in <module>()
----> 1 df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
      2 df_mis.show()

2 frames
/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/dataframe.py in select(self, *cols)
   1683         [Row(name='Alice', age=12), Row(name='Bob', age=15)]
   1684         """
-> 1685         jdf = self._jdf.select(self._jcols(*cols))
   1686         return DataFrame(jdf, self.sql_ctx)
   1687 

/content/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1308         answer = self.gateway_client.send_command(command)
   1309         return_value = get_return_value(
-> 1310             answer, self.gateway_client, self.target_id, self.name)
   1311 
   1312         for temp_arg in temp_args:

/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/utils.py in deco(*a, **kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise

AnalysisException: Can't extract value from place#14: need struct type but got double

Answer 1

我通过替换点“.”解决了这个问题。在带下划线的列名中。我发现以下 Whosebug post 非常有帮助。引用 post，“错误存在是因为 (.)dot 用于访问结构字段”。

AnalysisException：无法从 place#14 中提取值：需要结构类型但得到了 double

AnalysisException: Can't extract value from place#14: need struct type but got double

python

apache-spark

apache-spark-sql

pyspark