Spark 错误 class java.util.HashMap 无法转换为 class java.lang.String
Spark error class java.util.HashMap cannot be cast to class java.lang.String
当我尝试通过spark从elasticsearch中获取数据时出现以下错误。错误未指定错误位置。
body2 在 elasticsearch 的开发工具中工作
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
body2={
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"lte": "2022-05-03T09:25:15.000-03:00",
"gte": "2022-05-04T09:25:15.000-03:00"
}
}
},
{
"match": {
"type.keyword": "TABLA"
}
}
]
}
},
"size":10
}
es_read_conf = {
"es.nodes": "10.45.15.93",
"es.port": "9200",
"es.query": body2,
"es.nodes.wan.only": "true",
"es.resource" : "indice1/TABLA",
"es.net.http.auth.user": "usuario1",
"es.net.http.auth.pass": "rsl242442j"
}
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)
这是错误,我不知道代码中的错误在哪里:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/context.py", line 859, in newAPIHadoopRDD
jrdd = self._jvm.PythonRDD.newAPIHadoopRDD(self._jsc, inputFormatClass, keyClass,
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassCastException: class java.util.HashMap cannot be cast to class java.lang.String (java.util.HashMap and java.lang.String are in module java.base of loader 'bootstrap')
at org.apache.spark.api.python.PythonHadoopUtil$.$anonfun$mapToConf(PythonHadoopUtil.scala:160)
at org.apache.spark.api.python.PythonHadoopUtil$.$anonfun$mapToConf$adapted(PythonHadoopUtil.scala:160)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
...
我查看了代码,但没有找到解决方案
感谢大家
如错误消息所述,问题是查询应该是字符串而不是字典:
body2="""{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"lte": "2022-05-03T09:25:15.000-03:00",
"gte": "2022-05-04T09:25:15.000-03:00"
}
}
},
{
"match": {
"type.keyword": "TABLA"
}
}
]
}
},
"size":10
}"""
你可以看到那个的参考 here
当我尝试通过spark从elasticsearch中获取数据时出现以下错误。错误未指定错误位置。
body2 在 elasticsearch 的开发工具中工作
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
body2={
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"lte": "2022-05-03T09:25:15.000-03:00",
"gte": "2022-05-04T09:25:15.000-03:00"
}
}
},
{
"match": {
"type.keyword": "TABLA"
}
}
]
}
},
"size":10
}
es_read_conf = {
"es.nodes": "10.45.15.93",
"es.port": "9200",
"es.query": body2,
"es.nodes.wan.only": "true",
"es.resource" : "indice1/TABLA",
"es.net.http.auth.user": "usuario1",
"es.net.http.auth.pass": "rsl242442j"
}
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)
这是错误,我不知道代码中的错误在哪里:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/context.py", line 859, in newAPIHadoopRDD
jrdd = self._jvm.PythonRDD.newAPIHadoopRDD(self._jsc, inputFormatClass, keyClass,
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassCastException: class java.util.HashMap cannot be cast to class java.lang.String (java.util.HashMap and java.lang.String are in module java.base of loader 'bootstrap')
at org.apache.spark.api.python.PythonHadoopUtil$.$anonfun$mapToConf(PythonHadoopUtil.scala:160)
at org.apache.spark.api.python.PythonHadoopUtil$.$anonfun$mapToConf$adapted(PythonHadoopUtil.scala:160)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
...
我查看了代码,但没有找到解决方案
感谢大家
如错误消息所述,问题是查询应该是字符串而不是字典:
body2="""{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"lte": "2022-05-03T09:25:15.000-03:00",
"gte": "2022-05-04T09:25:15.000-03:00"
}
}
},
{
"match": {
"type.keyword": "TABLA"
}
}
]
}
},
"size":10
}"""
你可以看到那个的参考 here