将 CSV 导入不同的 SQL 表

Import CSVs into different SQL tables

我有一个充满 CSV 的目录,需要导入到 SQL 服务器数据库的不同 table 中。幸运的是,附加的 CSV 文件名以字符串“Concat_AAAAA_XX...”开头,其中 AAAAA 部分是字母数字字符串,后跟 XX,这是一个双精度整数。两者都充当 SQL.

中特定 table 的键

我的问题是创建一个 Python 脚本的最优雅的方法是什么,该脚本将从每个文件名中获取 AAAAA 和 XX 值,并知道将数据导入哪个 table?

CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A

CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B

CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C

and so on...

我读到 ConfigParser 包可能会有所帮助,但不明白如何在这里应用它的理论。建议使用 ConfigParser 的原因是因为我希望具有灵活性或编辑配置文件(例如“CONFIG.INI”),而不是必须将新条目硬编码到 python 脚本中。

到目前为止,我的代码仅适用于一个独立的数据集,可以在 here.

中找到

这是我使用的代码:

import pypyodbc as odbc
import pandas as pd 
import os

os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')

df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')

df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)

columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location', 
            'Address', 'Status', 'Status Date']

df_data = df[columns]
records = df_data.values.tolist()

DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'

def connection_string(driver, server_name, database_name):
    conn_string = f"""
        DRIVER={{{driver}}};
        SERVER={server_name};
        DATABASE={database_name};
        Trust_Connection=yes;        
    """
    return conn_string

try:
    conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
    print('Database Error:')    
    print(str(e.value[1]))
except odbc.Error as e:
    print('Connection Error:')
    print(str(e.value[1]))


sql_insert = '''
    INSERT INTO Austin_Traffic_Incident 
    VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''

try:
    cursor = conn.cursor()
    cursor.executemany(sql_insert, records)
    cursor.commit();    
except Exception as e:
    cursor.rollback()
    print(str(e[1]))
finally:
    print('Task is complete.')
    cursor.close()
    conn.close()

您可以使用 dict 之类的

进行翻译 table
import re
from glob import glob

translation_table = {
    '14': 'A', 
    '06': 'B',
    '21': 'C'
    }

# get all csv files from current directory
for filename in glob("*.csv"):

    # extract the file number with a regular expression
    # (can also be done easily with split function)
    filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)

    # use the translation table to get the table name
    tablename = translation_table[filenum]
    
    print(f"Data from file '{filename}' goes to table '{tablename}'")

我想说有多种方法可以做这种事情。您可以使用纯 SQL,正如我将在下面说明的那样,或者您可以使用 Python。如果您想要 Python 解决方案,只需返回 post,我将提供代码。有些人不喜欢人们在他们在原始 post 中列出的特定技术之外推荐的解决方案。所以,这是 SQL 解决方案。

DECLARE @intFlag INT
SET @intFlag = 1
WHILE (@intFlag <=48)
BEGIN

PRINT @intFlag


declare @fullpath1 varchar(1000)
select @fullpath1 = '''\source\FTP1\' + convert(varchar, getdate()- @intFlag , 112) + '_SPGT.SPL'''
declare @cmd1 nvarchar(1000)
select @cmd1 = 'bulk insert [dbo].[table1] from ' + @fullpath1 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd1)

-------------------------------------------

declare @fullpath2 varchar(1000)
select @fullpath2 = '''\source\FTP2\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C.SPL'''
declare @cmd2 nvarchar(1000)
select @cmd2 = 'bulk insert [dbo].[table2] from ' + @fullpath2 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd2)

-------------------------------------------

declare @fullpath3 varchar(1000)
select @fullpath3 = '''\source\FTP3\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C_ADJ.SPC'''
declare @cmd3 nvarchar(1000)
select @cmd3 = 'bulk insert [dbo].[table3] from ' + @fullpath3 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd3)

-------------------------------------------

declare @fullpath4 varchar(1000)
select @fullpath4 = '''\source\FTP4\' + convert(varchar, getdate()-@intFlag, 112) + '_SPGTINFRA_ADJ.SPC'''
declare @cmd4 nvarchar(1000)
select @cmd4 = 'bulk insert [dbo].[table4] from ' + @fullpath4 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd4)

SET @intFlag = @intFlag + 1
    
END
GO

这是您要求的 Python 解决方案。

当然,Python 解决方案要简单得多。

import pyodbc

engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"

for f in all_files: 
  # load each file into each dataframe...something like...
  df = pd.read_csv(f, delimiter='\t', skiprows=0, header=[0]) 
  # all_df[x].append(df) ... you may or may not need to append ...depends on your setup
  # depends on your setup...
  
  df.to_sql(table_name, engine, if_exists='replace', index=True, chunksize=100000)