尝试跳过 postgres 中的重复数据时出现 ProgrammingError sql

ProgrammingError when trying to skip duplicate data in postgres sql

PostGres SQL 将不接受违反主键的数据。要忽略重复数据,我有以下代码:

import pandas as pd
import psycopg2
import os
import matplotlib
from sqlalchemy import create_engine
from tqdm import tqdm_notebook
from pandas_datareader import data as web
import datetime
from dateutil.relativedelta import relativedelta


db_database =  os.environ.get('123')
engine = create_engine('postgresql://postgres:{}@localhost:5433/stockdata'.format(123))
def import_data(Symbol):

        df = web.DataReader(Symbol, 'yahoo',start=datetime.datetime.now()-relativedelta(days=3), end= datetime.datetime.now())
        insert_init = """INSERT INTO stockprices
                        (Symbol, Date, Volume, Open, Close, High, Low)
                        VALUES
                    """
        
        
        vals = ",".join(["""('{}','{}','{}','{}','{}','{}','{}')""".format(
            Symbol,
            Date,
            row.High,
            row.Low,
            row.Open,
            row.Close,
            row.Volume,
            ) for Date, row in df.iterrows()])
        
        
        insert_end ="""ON CONFLICT (Symbol, Date) DO UPDATE
                    SET 
                    Volume = EXCLUDED.Volume,
                    Open = EXCLUDED.Open,
                    Close = EXCLUDED.Close,
                    Low = EXCLUDED.Low,
                    High = EXCLUDED.High

                    """
        query = insert_init + vals + insert_end
        engine.execute(query)
                    
import_data('aapl')

我收到这个错误:

ProgrammingError: (psycopg2.errors.UndefinedColumn) column "symbol" of relation "stockprices" does not exist
LINE 2:                         (Symbol,Date, Volume, Open, Close, H...
                                 ^

[SQL: INSERT INTO stockprices

请问这个错误是什么意思?我按照评论中的建议去掉了所有双引号。


我曾使用此代码创建 table:

def create_price_table(symbol):

    print(symbol)
    df = web.DataReader(symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now())
    df['Symbol'] = symbol
    df.to_sql(name = "stockprices", con = engine, if_exists='append', index = True)
    return 'daily prices table created'


create_price_table('amzn')

评论里也提到了。我用它来检查 table 名称:

SELECT table_name
  FROM information_schema.tables
 WHERE table_schema='public'
   AND table_type='BASE TABLE';


编辑 1:

我按照评论中的建议更改了代码,现在列名是小写的。下面是代码: 将 pandas 导入为 pd 导入 psycopg2 进口 os 导入 matplotlib 从 sqlalchemy 导入 create_engine 从 tqdm 导入 tqdm_notebook 来自 pandas_datareader 将数据导入为网络 导入日期时间 从 dateutil.relativedelta 导入 relativedelta

db_database =  os.environ.get('123')
engine = create_engine('postgresql://postgres:{}@localhost:5433/stockdata'.format(123))
def create_price_table(symbol):

    print(symbol)
    df = web.DataReader(symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now())
    df['symbol'] = symbol
    df = df.rename(columns= {'Open':'open'})
    df = df.rename(columns= {'Close':'close'})
    df = df.rename(columns= {'High':'high'})
    df = df.rename(columns= {'Low':'low'})
    df = df.rename(columns= {'Volume':'volume'})
    df = df.rename(columns= {'Adj Close':'adj_close'})
    df.index.name ='date'
    df.to_sql(name = "stockprices", con = engine, if_exists='append', index = True)
    return 'daily prices table created'

# create_price_table('amzn')

def import_data(Symbol):
        df = web.DataReader(Symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=3), end= datetime.datetime.now())
        insert_init = """INSERT INTO stockprices
                        (symbol, date, volume, open, close, high, low)
                        VALUES
                    """
        
        
        vals = ",".join(["""('{}','{}','{}','{}','{}','{}','{}')""".format(
            Symbol,
            Date,
            row.High,   
            row.Low,
            row.Open,
            row.Close,
            row.Volume,
            ) for Date, row in df.iterrows()])
        
        
        insert_end ="""ON CONFLICT (Symbol, Date) DO UPDATE
                    SET 
                    Volume = EXCLUDED.Volume,
                    Open = EXCLUDED.Open,
                    Close = EXCLUDED.Close,
                    Low = EXCLUDED.Low,
                    High = EXCLUDED.High

                    """
        query = insert_init + vals + insert_end
        engine.execute(query)
                    
import_data('aapl')

但是这段代码产生了一个新的错误:

DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type bigint: "166.14999389648438"
LINE 4:                     ('aapl','2022-02-23 00:00:00','166.14999...
                                                          ^

根据我的评论,您有两个问题:

  1. 您正在尝试将浮点值 (166.14999389648438) 插入整数字段。首先要弄清楚为什么不匹配?真的希望数据库字段是一个整数吗?第二件事是,如果将值输入为 float/numeric:
  2. ,则尝试将浮点数强制转换为整数将起作用

select 166.14999389648438::bigint; 166

尽管如您所见,它被截断了。

如果以字符串形式输入将不起作用:

ERROR:  invalid input syntax for type bigint: "166.14999389648438"

这就是你在做什么。这导致下面的第二个问题。

  1. 您没有使用 link 中所示的正确 Parameter passing。除其他事项外,警告在哪里:

Warning

Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.

为了这个问题的目的,重要的部分是使用参数传递将导致正确的类型适应。