大字符串中的 IndexError

Question

我有一个带有 6000 个 ID 的字符串，每个 ID 有 7 个字符。

当我使用 """DELETE from t1 WHERE "ID" in (%s)""", ids 时，出现以下错误：

[2021-12-09, 10:15:07 -03] {spark_submit.py:523} INFO - c.execute("""DELETE from t1 WHERE "ID" in (%s)""", ids)

[2021-12-09, 10:15:07 -03] {spark_submit.py:523} INFO - IndexError: string index out of range

有没有办法解决这个字符串大小的问题，不用一个一个删除每个id？

通过@niko 请求更新：

[2021-12-09, 12:31:14 -03] {spark_submit.py:523} INFO - c.execute("""DELETE from marketing.client WHERE "ID" in (%s)""", (ids,))
[2021-12-09, 12:31:14 -03] {spark_submit.py:523} INFO - psycopg2.errors.SyntaxError: syntax error at or near ")"
[2021-12-09, 12:31:14 -03] {spark_submit.py:523} INFO - LINE 1: DELETE from marketing.client WHERE "ID" in (())

Answer 1

使用 psycopg2，您应该将 tuple 作为第二个参数传递给 execute 方法，其中包含查询的所有参数，例如

c.execute("SELECT * from table where id = %s", (1,))

当传递一个id列表时，你必须在元组内部使用一个元组，例如

c.execute("SELECT * from table where id in %s", ((1, 2, 3),))

编辑

对于你的情况，尝试改变

ids = ",".join(str(x) for x in df.select("ID").rdd.flatMap(lambda x: x).collect())

到

# Make it a tuple of integers
ids = tuple(int(x) for x in df.select("ID").rdd.flatMap(lambda x: x).collect())

即由于您正在查询一个 BigInt 列，因此您希望 ids 是一个整数元组。

然后运行

c.execute("""DELETE from t1 WHERE "ID" in %s""", (ids,))

大字符串中的 IndexError

IndexError in a large a string

python

psycopg2