TypeError: Column is not iterable
TypeError: Column is not iterable
s = ["abcd:{'name':'john'}","defasdf:{'num':123}"]
df = spark.createDataFrame(s, "string").toDF("request")
display(df)
+--------------------+
| request|
+--------------------+
|abcd:{'name':'john'}|
| defasdf:{'num':123}|
+--------------------+
我想得到
+--------------------+---------------+
| request| sub|
+--------------------+---------------+
|abcd:{'name':'john'}|{'name':'john'}|
| defasdf:{'num':123}| {'num':123}|
+--------------------+---------------+
我确实写了如下,但是它抛出错误:
TypeError: Column is not iterable
df = df.withColumn("sub",substring(col('request'),locate('{',col('request')),length(col('request'))-locate('{',col('request'))))
df.show()
有人可以帮我吗?
您需要在 SQL 表达式中使用 substring
函数,以便为 position
和 length
参数传递列。另请注意,您需要将 +1
添加到长度以获得正确的结果:
import pyspark.sql.functions as F
df = df.withColumn(
"json",
F.expr("substring(request, locate('{',request), length(request) - locate('{', request) + 1)")
)
df.show()
#+--------------------+---------------+
#| request| json|
#+--------------------+---------------+
#|abcd:{'name':'john'}|{'name':'john'}|
#| defasdf:{'num':123}| {'num':123}|
#+--------------------+---------------+
您也可以考虑使用 regexp_extract
函数而不是像这样的子字符串:
df = df.withColumn(
"json",
F.regexp_extract("request", "^.*:(\{.*\})$", 1)
)
s = ["abcd:{'name':'john'}","defasdf:{'num':123}"]
df = spark.createDataFrame(s, "string").toDF("request")
display(df)
+--------------------+
| request|
+--------------------+
|abcd:{'name':'john'}|
| defasdf:{'num':123}|
+--------------------+
我想得到
+--------------------+---------------+
| request| sub|
+--------------------+---------------+
|abcd:{'name':'john'}|{'name':'john'}|
| defasdf:{'num':123}| {'num':123}|
+--------------------+---------------+
我确实写了如下,但是它抛出错误:
TypeError: Column is not iterable
df = df.withColumn("sub",substring(col('request'),locate('{',col('request')),length(col('request'))-locate('{',col('request'))))
df.show()
有人可以帮我吗?
您需要在 SQL 表达式中使用 substring
函数,以便为 position
和 length
参数传递列。另请注意,您需要将 +1
添加到长度以获得正确的结果:
import pyspark.sql.functions as F
df = df.withColumn(
"json",
F.expr("substring(request, locate('{',request), length(request) - locate('{', request) + 1)")
)
df.show()
#+--------------------+---------------+
#| request| json|
#+--------------------+---------------+
#|abcd:{'name':'john'}|{'name':'john'}|
#| defasdf:{'num':123}| {'num':123}|
#+--------------------+---------------+
您也可以考虑使用 regexp_extract
函数而不是像这样的子字符串:
df = df.withColumn(
"json",
F.regexp_extract("request", "^.*:(\{.*\})$", 1)
)