PyFlink - Scala UDF - 如何在 Table API 中转换 Scala Map?
PyFlink - Scala UDF - How to convert Scala Map in Table API?
我正在尝试将我的 Scala UDF (scala.collection.immutable.map
) 的 Map[String,String]
对象输出映射到 Table API 中的某个有效数据类型,即通过Java 按照此处的建议键入 (java.util.Map
):。但是我得到以下错误。
知道正确的处理方法吗?如果是,是否有一种方法可以将转换概括为 Map[String,Any]
类型的(嵌套)Scala 对象?
代码
Scala UDF
class dummyMap() extends ScalarFunction {
def eval() = {
val whatevermap = Map("key1" -> "val1", "key2" -> "val2")
whatevermap.asInstanceOf[java.util.Map[java.lang.String,java.lang.String]]
}
}
下沉
my_sink_ddl = f"""
create table mySink (
output_of_dummyMap_udf MAP<STRING,STRING>
) with (
...
)
"""
错误
Py4JJavaError: An error occurred while calling o430.execute.
: org.apache.flink.table.api.ValidationException: Field types of query result and registered TableSink `default_catalog`.`default_database`.`mySink` do not match.
Query result schema: [output_of_my_scala_udf: GenericType<java.util.Map>]
TableSink schema: [output_of_my_scala_udf: Map<String, String>]
谢谢!
来自 Wei Zhong 的原始答案。
我只是记者谢谢小薇!
此时(Flink 1.11),有两种方法在起作用:
- 当前:UDF 定义中的 DataTypeHint + SQL 用于 UDF 注册
- 已过时:覆盖 UDF 定义中的 getResultType + t_env.register_java_function 用于 UDF 注册
代码
Scala UDF
package com.dummy
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.table.annotation.DataTypeHint
import org.apache.flink.table.api.Types
import org.apache.flink.table.functions.ScalarFunction
import org.apache.flink.types.Row
class dummyMap extends ScalarFunction {
// If the udf would be registered by the SQL statement, you need add this typehint
@DataTypeHint("ROW<s STRING,t STRING>")
def eval(): Row = {
Row.of(java.lang.String.valueOf("foo"), java.lang.String.valueOf("bar"))
}
// If the udf would be registered by the method 'register_java_function', you need override this
// method.
override def getResultType(signature: Array[Class[_]]): TypeInformation[_] = {
// The type of the return values should be TypeInformation
Types.ROW(Array("s", "t"), Array[TypeInformation[_]](Types.STRING(), Types.STRING()))
}
}
Python代码
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment
s_env = StreamExecutionEnvironment.get_execution_environment()
st_env = StreamTableEnvironment.create(s_env)
# load the scala udf jar file, the path should be modified to yours
# or your can also load the jar file via other approaches
st_env.get_config().get_configuration().set_string("pipeline.jars", "file:///Users/zhongwei/the-dummy-udf.jar")
# register the udf via
st_env.execute_sql("CREATE FUNCTION dummyMap AS 'com.dummy.dummyMap' LANGUAGE SCALA")
# or register via the method
# st_env.register_java_function("dummyMap", "com.dummy.dummyMap")
# prepare source and sink
t = st_env.from_elements([(1, 'hi', 'hello'), (2, 'hi', 'hello')], ['a', 'b', 'c'])
st_env.execute_sql("""create table mySink (
output_of_my_scala_udf ROW<s STRING,t STRING>
) with (
'connector' = 'print'
)""")
# execute query
t.select("dummyMap()").execute_insert("mySink").get_job_client().get_job_execution_result().result()
我正在尝试将我的 Scala UDF (scala.collection.immutable.map
) 的 Map[String,String]
对象输出映射到 Table API 中的某个有效数据类型,即通过Java 按照此处的建议键入 (java.util.Map
):
知道正确的处理方法吗?如果是,是否有一种方法可以将转换概括为 Map[String,Any]
类型的(嵌套)Scala 对象?
代码
Scala UDF
class dummyMap() extends ScalarFunction {
def eval() = {
val whatevermap = Map("key1" -> "val1", "key2" -> "val2")
whatevermap.asInstanceOf[java.util.Map[java.lang.String,java.lang.String]]
}
}
下沉
my_sink_ddl = f"""
create table mySink (
output_of_dummyMap_udf MAP<STRING,STRING>
) with (
...
)
"""
错误
Py4JJavaError: An error occurred while calling o430.execute.
: org.apache.flink.table.api.ValidationException: Field types of query result and registered TableSink `default_catalog`.`default_database`.`mySink` do not match.
Query result schema: [output_of_my_scala_udf: GenericType<java.util.Map>]
TableSink schema: [output_of_my_scala_udf: Map<String, String>]
谢谢!
来自 Wei Zhong 的原始答案。 我只是记者谢谢小薇!
此时(Flink 1.11),有两种方法在起作用:
- 当前:UDF 定义中的 DataTypeHint + SQL 用于 UDF 注册
- 已过时:覆盖 UDF 定义中的 getResultType + t_env.register_java_function 用于 UDF 注册
代码
Scala UDF
package com.dummy
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.table.annotation.DataTypeHint
import org.apache.flink.table.api.Types
import org.apache.flink.table.functions.ScalarFunction
import org.apache.flink.types.Row
class dummyMap extends ScalarFunction {
// If the udf would be registered by the SQL statement, you need add this typehint
@DataTypeHint("ROW<s STRING,t STRING>")
def eval(): Row = {
Row.of(java.lang.String.valueOf("foo"), java.lang.String.valueOf("bar"))
}
// If the udf would be registered by the method 'register_java_function', you need override this
// method.
override def getResultType(signature: Array[Class[_]]): TypeInformation[_] = {
// The type of the return values should be TypeInformation
Types.ROW(Array("s", "t"), Array[TypeInformation[_]](Types.STRING(), Types.STRING()))
}
}
Python代码
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment
s_env = StreamExecutionEnvironment.get_execution_environment()
st_env = StreamTableEnvironment.create(s_env)
# load the scala udf jar file, the path should be modified to yours
# or your can also load the jar file via other approaches
st_env.get_config().get_configuration().set_string("pipeline.jars", "file:///Users/zhongwei/the-dummy-udf.jar")
# register the udf via
st_env.execute_sql("CREATE FUNCTION dummyMap AS 'com.dummy.dummyMap' LANGUAGE SCALA")
# or register via the method
# st_env.register_java_function("dummyMap", "com.dummy.dummyMap")
# prepare source and sink
t = st_env.from_elements([(1, 'hi', 'hello'), (2, 'hi', 'hello')], ['a', 'b', 'c'])
st_env.execute_sql("""create table mySink (
output_of_my_scala_udf ROW<s STRING,t STRING>
) with (
'connector' = 'print'
)""")
# execute query
t.select("dummyMap()").execute_insert("mySink").get_job_client().get_job_execution_result().result()