select 来自 Oracle 的 XMLTYPE 解析并将记录插入另一个 Oracle 数据库
select XMLTYPE from Oracle Parse and insert records into another Oracle DB
我正在使用 Python 从 Oracle 中选择 XMLTYPE
字段并解析为单独的字段并插入到另一个 Oracle 数据库中。下面是我的代码。主标签内有大约 300 个标签,一些子标签有多个值。
以下是我的问题:
1.There 是 table 中的数百万条记录,而且速度非常非常慢。需要很长时间才能完成。
请帮助 python 在不影响性能的情况下高效地实现这一目标。
我的出身Table:
desc cust_test
Name Null Type
------ ---- ---------
RECID NUMBER
RECORD XMLTYPE()
数据:
100227
"<row xml:space="preserve" id="100227">
<c1>BRENT</c1>
<c2>BRENT2</c2>
<c3>Brent3</c3>
<c4>Brent4</c4>
<c5>CP</c5>
<c7>GL</c7>
<c9>US</c9>
<c23>1001</c23>
<c24>26</c24>
<c25>4</c25>
<c26>1000</c26>
<c27>2</c27>
<c28>US</c28>
<c29>1</c29>
<c30>GB</c30>
<c31>20210315</c31>
<c42>19581212</c42>
<c45>1</c45>
<c48>US0010001</c48>
<c52>NO</c52>
<c57>VALUED.CUSTOMER</c57>
<c58>11</c58>
<c60>MR</c60>
<c61>Brent61</c61>
<c63>MALE</c63>
<c64>19720915</c64>
<c68>+12345678</c68>
<c69>bc@gmail.com</c69>
<c132>YES</c132>
<c133>NULL</c133>
<c134>NULL</c134>
<c137>NULL</c137>
<c138>NULL</c138>
<c149>VALUED.CUSTOMER</c149>
<c150>11</c150>
<c151>11</c151>
<c179/>
<c179 m="6">NO</c179>
<c179 m="15">OPT-IN</c179>
<c179 m="16">20210315</c179>
<c179 m="176"/>
<c180>EB.US.ADD.RES.CHAN.AGR}Field ADDRESS/RESIDENCE changed. Still agree?</c180>
<c180 m="2">KYC/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="3">PWM/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="4">INTRO/US*41 FROM 10 NOT RECEIVED</c180>
<c182>2</c182>
<c183>17338_OFFICER__OFS_SEAT</c183>
<c184>2104271357</c184>
<c185>17338_OFFICER_OFS_SEAT</c185>
<c186>GB0010001</c186>
<c187>1</c187>
</row>"
Python代码:
#!/usr/bin/env python3
import os
import sys
import time
import csv
import cx_Oracle
import xml.etree.ElementTree as ET
import pandas as pd
con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
start = time.time()
SQL = "SELECT RECID,RECORD FROM cust_test2"
#print(SQL)
cur = con.cursor()
cur.prefetchrows = 1000
cur.arraysize = 1000
f = open("D:\Various\myfile21.csv", "w")
writer = csv.writer(f, lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
r = cur.execute(SQL)
#col_names = [row[0] for row in cur.description]
#writer.writerow(col_names)
res = cur.fetchall()
for row in res:
tree = ET.ElementTree(ET.fromstring(row[1]))
root = tree.getroot()
for child in root:
#print(child.tag, child.text, child.attrib)
if (child.tag == 'c1') :
s1 = pd.Series(child.text)
if (child.tag == 'c3') :
s3 = pd.Series(child.text)
if (child.tag == 'c23') :
s23 = pd.Series(child.text)
if (child.tag == 'c179') :
s179 = pd.Series(child.text)
.......
df = pd.DataFrame({"c1": s1,
"c3": s3,
"c23": s23,
"c179": s179
.......})
file_name = 'events.csv'
df.to_csv(file_name, sep='\t')
ResultSet_Py_List = []
ora_conn = cx_Oracle.connect('custom/custom@orcl')
ora_cursor = ora_conn.cursor()
my_file = open(file_name, 'r', newline='')
reader = csv.reader(my_file, dialect='excel', delimiter='\t')
row1 = next(reader)
for index, row in enumerate(reader):
print(row)
ResultSet_Py_List.append(row)
print(str(len(ResultSet_Py_List)) + ' Records from Source')
sql_insert = """
INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....)
VALUES (:1,:2,:3,:4,:5,.....)
"""
ora_cursor.prepare(sql_insert)
ora_cursor.executemany(None, ResultSet_Py_List)
ora_conn.commit()
#writer.writerow(row)
f.close()
elapsed = (time.time() - start)
print(elapsed, "seconds")
考虑使用 Oracle 的 XMLTABLE
从 XMLTYPE
列中提取所需的值。 运行 这样的查询来自 Python 并在两个数据库连接之间直接将值从 cursor.fetchall
传递到 cursor.executemany
。这种方法避免了 XML 解析、Pandas 操作和 CSV write/read 步骤。另外,源 SQL 直接在服务器上查询 XML 数据。
SQL (完成所有c
节点,根据需要调整数据类型,另存为.sql)
SELECT x.c1, x.c2, ..., x.c187
FROM MY_TABLE,
XMLTABLE('/row'
PASSING XMLTYPE(MY_TABLE.RECORD)
COLUMNS c1 VARCHAR2(50) PATH 'c1',
c2 VARCHAR2(50) PATH 'c2',
c3 VARCHAR2(50) PATH 'c3',
...
c187 VARCHAR2(50) PATH 'c187'
) AS x
Python
import cx_Oracle
# READ SQL QUERY
with open('/path/to/myquery.sql', mode='r') as f:
src_sql = f.read()
# SOURCE CONNECTION
src_con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
src_cur = src_con.cursor()
src_cur.prefetchrows = 1000
src_cur.arraysize = 1000
# DESTINATION CONNECTION
dst_con = cx_Oracle.connect('custom/custom@orcl')
dst_cur = dst_con.cursor()
# APPEND QUERY (MAKE SURE COLUMNS ALIGN TO SOURCE QUERY)
dst_sql = """INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....)
VALUES (:1,:2,:3,:4,:5,.....)"""
# FETCH SOURCE DATA
src_cur.execute(src_sql)
src_res = src_cur.fetchall()
# APPEND TO DESTINATION TABLE
dst_cur.prepare(dst_sql)
dst_cur.executemany(None, src_res)
dst_con.commit()
# CLOSE CURSORS AND CONNECTIONS
src_cur.close(); src_con.close()
dst_cur.close(); dst_con.close()
我正在使用 Python 从 Oracle 中选择 XMLTYPE
字段并解析为单独的字段并插入到另一个 Oracle 数据库中。下面是我的代码。主标签内有大约 300 个标签,一些子标签有多个值。
以下是我的问题:
1.There 是 table 中的数百万条记录,而且速度非常非常慢。需要很长时间才能完成。
请帮助 python 在不影响性能的情况下高效地实现这一目标。
我的出身Table:
desc cust_test
Name Null Type
------ ---- ---------
RECID NUMBER
RECORD XMLTYPE()
数据:
100227
"<row xml:space="preserve" id="100227">
<c1>BRENT</c1>
<c2>BRENT2</c2>
<c3>Brent3</c3>
<c4>Brent4</c4>
<c5>CP</c5>
<c7>GL</c7>
<c9>US</c9>
<c23>1001</c23>
<c24>26</c24>
<c25>4</c25>
<c26>1000</c26>
<c27>2</c27>
<c28>US</c28>
<c29>1</c29>
<c30>GB</c30>
<c31>20210315</c31>
<c42>19581212</c42>
<c45>1</c45>
<c48>US0010001</c48>
<c52>NO</c52>
<c57>VALUED.CUSTOMER</c57>
<c58>11</c58>
<c60>MR</c60>
<c61>Brent61</c61>
<c63>MALE</c63>
<c64>19720915</c64>
<c68>+12345678</c68>
<c69>bc@gmail.com</c69>
<c132>YES</c132>
<c133>NULL</c133>
<c134>NULL</c134>
<c137>NULL</c137>
<c138>NULL</c138>
<c149>VALUED.CUSTOMER</c149>
<c150>11</c150>
<c151>11</c151>
<c179/>
<c179 m="6">NO</c179>
<c179 m="15">OPT-IN</c179>
<c179 m="16">20210315</c179>
<c179 m="176"/>
<c180>EB.US.ADD.RES.CHAN.AGR}Field ADDRESS/RESIDENCE changed. Still agree?</c180>
<c180 m="2">KYC/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="3">PWM/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="4">INTRO/US*41 FROM 10 NOT RECEIVED</c180>
<c182>2</c182>
<c183>17338_OFFICER__OFS_SEAT</c183>
<c184>2104271357</c184>
<c185>17338_OFFICER_OFS_SEAT</c185>
<c186>GB0010001</c186>
<c187>1</c187>
</row>"
Python代码:
#!/usr/bin/env python3
import os
import sys
import time
import csv
import cx_Oracle
import xml.etree.ElementTree as ET
import pandas as pd
con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
start = time.time()
SQL = "SELECT RECID,RECORD FROM cust_test2"
#print(SQL)
cur = con.cursor()
cur.prefetchrows = 1000
cur.arraysize = 1000
f = open("D:\Various\myfile21.csv", "w")
writer = csv.writer(f, lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
r = cur.execute(SQL)
#col_names = [row[0] for row in cur.description]
#writer.writerow(col_names)
res = cur.fetchall()
for row in res:
tree = ET.ElementTree(ET.fromstring(row[1]))
root = tree.getroot()
for child in root:
#print(child.tag, child.text, child.attrib)
if (child.tag == 'c1') :
s1 = pd.Series(child.text)
if (child.tag == 'c3') :
s3 = pd.Series(child.text)
if (child.tag == 'c23') :
s23 = pd.Series(child.text)
if (child.tag == 'c179') :
s179 = pd.Series(child.text)
.......
df = pd.DataFrame({"c1": s1,
"c3": s3,
"c23": s23,
"c179": s179
.......})
file_name = 'events.csv'
df.to_csv(file_name, sep='\t')
ResultSet_Py_List = []
ora_conn = cx_Oracle.connect('custom/custom@orcl')
ora_cursor = ora_conn.cursor()
my_file = open(file_name, 'r', newline='')
reader = csv.reader(my_file, dialect='excel', delimiter='\t')
row1 = next(reader)
for index, row in enumerate(reader):
print(row)
ResultSet_Py_List.append(row)
print(str(len(ResultSet_Py_List)) + ' Records from Source')
sql_insert = """
INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....)
VALUES (:1,:2,:3,:4,:5,.....)
"""
ora_cursor.prepare(sql_insert)
ora_cursor.executemany(None, ResultSet_Py_List)
ora_conn.commit()
#writer.writerow(row)
f.close()
elapsed = (time.time() - start)
print(elapsed, "seconds")
考虑使用 Oracle 的 XMLTABLE
从 XMLTYPE
列中提取所需的值。 运行 这样的查询来自 Python 并在两个数据库连接之间直接将值从 cursor.fetchall
传递到 cursor.executemany
。这种方法避免了 XML 解析、Pandas 操作和 CSV write/read 步骤。另外,源 SQL 直接在服务器上查询 XML 数据。
SQL (完成所有c
节点,根据需要调整数据类型,另存为.sql)
SELECT x.c1, x.c2, ..., x.c187
FROM MY_TABLE,
XMLTABLE('/row'
PASSING XMLTYPE(MY_TABLE.RECORD)
COLUMNS c1 VARCHAR2(50) PATH 'c1',
c2 VARCHAR2(50) PATH 'c2',
c3 VARCHAR2(50) PATH 'c3',
...
c187 VARCHAR2(50) PATH 'c187'
) AS x
Python
import cx_Oracle
# READ SQL QUERY
with open('/path/to/myquery.sql', mode='r') as f:
src_sql = f.read()
# SOURCE CONNECTION
src_con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
src_cur = src_con.cursor()
src_cur.prefetchrows = 1000
src_cur.arraysize = 1000
# DESTINATION CONNECTION
dst_con = cx_Oracle.connect('custom/custom@orcl')
dst_cur = dst_con.cursor()
# APPEND QUERY (MAKE SURE COLUMNS ALIGN TO SOURCE QUERY)
dst_sql = """INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....)
VALUES (:1,:2,:3,:4,:5,.....)"""
# FETCH SOURCE DATA
src_cur.execute(src_sql)
src_res = src_cur.fetchall()
# APPEND TO DESTINATION TABLE
dst_cur.prepare(dst_sql)
dst_cur.executemany(None, src_res)
dst_con.commit()
# CLOSE CURSORS AND CONNECTIONS
src_cur.close(); src_con.close()
dst_cur.close(); dst_con.close()