删除配置单元列的前导和尾随字符

Question

我有一个配置单元列，它在 column.The 列中有未知数量的前导和尾随双引号，在数据中也有双引号。

例如列如下所示

我想要如下输出

我已经编写了一个 pyspark 代码，我可以在其中删除 " 并且它可以工作，但我想要一个 hql 中的解决方案。我也试过 regexp_replace 喜欢

regexp_replace(test,'^"|^""|""$|"$', "")

但这是一种硬编码。有人可以为此提供通用解决方案吗？

Answer 1

试试这个-

val df = spark.sql("select '\"\"\"56\"7\"' as test")
    df.show(false)
    /**
      * +--------+
      * |test    |
      * +--------+
      * |"""56"7"|
      * +--------+
      */
    df.createOrReplaceTempView("table")
    spark.sql("select test, regexp_replace(test, '^\"+|\"+$', '') as test_new from table")
      .show(false)

    /**
      * +--------+--------+
      * |test    |test_new|
      * +--------+--------+
      * |"""56"7"|56"7    |
      * +--------+--------+
      */

删除配置单元列的前导和尾随字符

Remove leading and trailing chacters for hive column

hive

hql

dataframe

apache-spark-sql