将 CSV 导入不同的 SQL 表
Import CSVs into different SQL tables
我有一个充满 CSV 的目录,需要导入到 SQL 服务器数据库的不同 table 中。幸运的是,附加的 CSV 文件名以字符串“Concat_AAAAA_XX...”开头,其中 AAAAA 部分是字母数字字符串,后跟 XX,这是一个双精度整数。两者都充当 SQL.
中特定 table 的键
我的问题是创建一个 Python 脚本的最优雅的方法是什么,该脚本将从每个文件名中获取 AAAAA 和 XX 值,并知道将数据导入哪个 table?
CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A
CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B
CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C
and so on...
我读到 ConfigParser 包可能会有所帮助,但不明白如何在这里应用它的理论。建议使用 ConfigParser 的原因是因为我希望具有灵活性或编辑配置文件(例如“CONFIG.INI”),而不是必须将新条目硬编码到 python 脚本中。
到目前为止,我的代码仅适用于一个独立的数据集,可以在 here.
中找到
这是我使用的代码:
import pypyodbc as odbc
import pandas as pd
import os
os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')
df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)
columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location',
'Address', 'Status', 'Status Date']
df_data = df[columns]
records = df_data.values.tolist()
DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'
def connection_string(driver, server_name, database_name):
conn_string = f"""
DRIVER={{{driver}}};
SERVER={server_name};
DATABASE={database_name};
Trust_Connection=yes;
"""
return conn_string
try:
conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
print('Database Error:')
print(str(e.value[1]))
except odbc.Error as e:
print('Connection Error:')
print(str(e.value[1]))
sql_insert = '''
INSERT INTO Austin_Traffic_Incident
VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''
try:
cursor = conn.cursor()
cursor.executemany(sql_insert, records)
cursor.commit();
except Exception as e:
cursor.rollback()
print(str(e[1]))
finally:
print('Task is complete.')
cursor.close()
conn.close()
您可以使用 dict
之类的
进行翻译 table
import re
from glob import glob
translation_table = {
'14': 'A',
'06': 'B',
'21': 'C'
}
# get all csv files from current directory
for filename in glob("*.csv"):
# extract the file number with a regular expression
# (can also be done easily with split function)
filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)
# use the translation table to get the table name
tablename = translation_table[filenum]
print(f"Data from file '{filename}' goes to table '{tablename}'")
我想说有多种方法可以做这种事情。您可以使用纯 SQL,正如我将在下面说明的那样,或者您可以使用 Python。如果您想要 Python 解决方案,只需返回 post,我将提供代码。有些人不喜欢人们在他们在原始 post 中列出的特定技术之外推荐的解决方案。所以,这是 SQL 解决方案。
DECLARE @intFlag INT
SET @intFlag = 1
WHILE (@intFlag <=48)
BEGIN
PRINT @intFlag
declare @fullpath1 varchar(1000)
select @fullpath1 = '''\source\FTP1\' + convert(varchar, getdate()- @intFlag , 112) + '_SPGT.SPL'''
declare @cmd1 nvarchar(1000)
select @cmd1 = 'bulk insert [dbo].[table1] from ' + @fullpath1 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd1)
-------------------------------------------
declare @fullpath2 varchar(1000)
select @fullpath2 = '''\source\FTP2\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C.SPL'''
declare @cmd2 nvarchar(1000)
select @cmd2 = 'bulk insert [dbo].[table2] from ' + @fullpath2 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd2)
-------------------------------------------
declare @fullpath3 varchar(1000)
select @fullpath3 = '''\source\FTP3\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C_ADJ.SPC'''
declare @cmd3 nvarchar(1000)
select @cmd3 = 'bulk insert [dbo].[table3] from ' + @fullpath3 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd3)
-------------------------------------------
declare @fullpath4 varchar(1000)
select @fullpath4 = '''\source\FTP4\' + convert(varchar, getdate()-@intFlag, 112) + '_SPGTINFRA_ADJ.SPC'''
declare @cmd4 nvarchar(1000)
select @cmd4 = 'bulk insert [dbo].[table4] from ' + @fullpath4 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd4)
SET @intFlag = @intFlag + 1
END
GO
这是您要求的 Python 解决方案。
当然,Python 解决方案要简单得多。
import pyodbc
engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"
for f in all_files:
# load each file into each dataframe...something like...
df = pd.read_csv(f, delimiter='\t', skiprows=0, header=[0])
# all_df[x].append(df) ... you may or may not need to append ...depends on your setup
# depends on your setup...
df.to_sql(table_name, engine, if_exists='replace', index=True, chunksize=100000)
我有一个充满 CSV 的目录,需要导入到 SQL 服务器数据库的不同 table 中。幸运的是,附加的 CSV 文件名以字符串“Concat_AAAAA_XX...”开头,其中 AAAAA 部分是字母数字字符串,后跟 XX,这是一个双精度整数。两者都充当 SQL.
中特定 table 的键我的问题是创建一个 Python 脚本的最优雅的方法是什么,该脚本将从每个文件名中获取 AAAAA 和 XX 值,并知道将数据导入哪个 table?
CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A
CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B
CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C
and so on...
我读到 ConfigParser 包可能会有所帮助,但不明白如何在这里应用它的理论。建议使用 ConfigParser 的原因是因为我希望具有灵活性或编辑配置文件(例如“CONFIG.INI”),而不是必须将新条目硬编码到 python 脚本中。
到目前为止,我的代码仅适用于一个独立的数据集,可以在 here.
中找到这是我使用的代码:
import pypyodbc as odbc
import pandas as pd
import os
os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')
df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)
columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location',
'Address', 'Status', 'Status Date']
df_data = df[columns]
records = df_data.values.tolist()
DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'
def connection_string(driver, server_name, database_name):
conn_string = f"""
DRIVER={{{driver}}};
SERVER={server_name};
DATABASE={database_name};
Trust_Connection=yes;
"""
return conn_string
try:
conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
print('Database Error:')
print(str(e.value[1]))
except odbc.Error as e:
print('Connection Error:')
print(str(e.value[1]))
sql_insert = '''
INSERT INTO Austin_Traffic_Incident
VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''
try:
cursor = conn.cursor()
cursor.executemany(sql_insert, records)
cursor.commit();
except Exception as e:
cursor.rollback()
print(str(e[1]))
finally:
print('Task is complete.')
cursor.close()
conn.close()
您可以使用 dict
之类的
import re
from glob import glob
translation_table = {
'14': 'A',
'06': 'B',
'21': 'C'
}
# get all csv files from current directory
for filename in glob("*.csv"):
# extract the file number with a regular expression
# (can also be done easily with split function)
filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)
# use the translation table to get the table name
tablename = translation_table[filenum]
print(f"Data from file '{filename}' goes to table '{tablename}'")
我想说有多种方法可以做这种事情。您可以使用纯 SQL,正如我将在下面说明的那样,或者您可以使用 Python。如果您想要 Python 解决方案,只需返回 post,我将提供代码。有些人不喜欢人们在他们在原始 post 中列出的特定技术之外推荐的解决方案。所以,这是 SQL 解决方案。
DECLARE @intFlag INT
SET @intFlag = 1
WHILE (@intFlag <=48)
BEGIN
PRINT @intFlag
declare @fullpath1 varchar(1000)
select @fullpath1 = '''\source\FTP1\' + convert(varchar, getdate()- @intFlag , 112) + '_SPGT.SPL'''
declare @cmd1 nvarchar(1000)
select @cmd1 = 'bulk insert [dbo].[table1] from ' + @fullpath1 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd1)
-------------------------------------------
declare @fullpath2 varchar(1000)
select @fullpath2 = '''\source\FTP2\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C.SPL'''
declare @cmd2 nvarchar(1000)
select @cmd2 = 'bulk insert [dbo].[table2] from ' + @fullpath2 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd2)
-------------------------------------------
declare @fullpath3 varchar(1000)
select @fullpath3 = '''\source\FTP3\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C_ADJ.SPC'''
declare @cmd3 nvarchar(1000)
select @cmd3 = 'bulk insert [dbo].[table3] from ' + @fullpath3 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd3)
-------------------------------------------
declare @fullpath4 varchar(1000)
select @fullpath4 = '''\source\FTP4\' + convert(varchar, getdate()-@intFlag, 112) + '_SPGTINFRA_ADJ.SPC'''
declare @cmd4 nvarchar(1000)
select @cmd4 = 'bulk insert [dbo].[table4] from ' + @fullpath4 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd4)
SET @intFlag = @intFlag + 1
END
GO
这是您要求的 Python 解决方案。
当然,Python 解决方案要简单得多。
import pyodbc
engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"
for f in all_files:
# load each file into each dataframe...something like...
df = pd.read_csv(f, delimiter='\t', skiprows=0, header=[0])
# all_df[x].append(df) ... you may or may not need to append ...depends on your setup
# depends on your setup...
df.to_sql(table_name, engine, if_exists='replace', index=True, chunksize=100000)