select 来自 Oracle 的 XMLTYPE 解析并将记录插入另一个 Oracle 数据库

select XMLTYPE from Oracle Parse and insert records into another Oracle DB

我正在使用 Python 从 Oracle 中选择 XMLTYPE 字段并解析为单独的字段并插入到另一个 Oracle 数据库中。下面是我的代码。主标签内有大约 300 个标签,一些子标签有多个值。

以下是我的问题:

1.There 是 table 中的数百万条记录,而且速度非常非常慢。需要很长时间才能完成。

请帮助 python 在不影响性能的情况下高效地实现这一目标。

我的出身Table:

desc cust_test
Name   Null Type      
------ ---- --------- 
RECID       NUMBER    
RECORD      XMLTYPE()

数据:

100227  

"<row xml:space="preserve" id="100227">
    <c1>BRENT</c1>
    <c2>BRENT2</c2>
    <c3>Brent3</c3>
    <c4>Brent4</c4>
    <c5>CP</c5>
    <c7>GL</c7>
    <c9>US</c9>
    <c23>1001</c23>
    <c24>26</c24>
    <c25>4</c25>
    <c26>1000</c26>
    <c27>2</c27>
    <c28>US</c28>
    <c29>1</c29>
    <c30>GB</c30>
    <c31>20210315</c31>
    <c42>19581212</c42>
    <c45>1</c45>
    <c48>US0010001</c48>
    <c52>NO</c52>
    <c57>VALUED.CUSTOMER</c57>
    <c58>11</c58>
    <c60>MR</c60>
    <c61>Brent61</c61>
    <c63>MALE</c63>
    <c64>19720915</c64>
    <c68>+12345678</c68>
    <c69>bc@gmail.com</c69>
    <c132>YES</c132>
    <c133>NULL</c133>
    <c134>NULL</c134>
    <c137>NULL</c137>
    <c138>NULL</c138>
    <c149>VALUED.CUSTOMER</c149>
    <c150>11</c150>
    <c151>11</c151>
    <c179/>
    <c179 m="6">NO</c179>
    <c179 m="15">OPT-IN</c179>
    <c179 m="16">20210315</c179>
    <c179 m="176"/>
    <c180>EB.US.ADD.RES.CHAN.AGR}Field ADDRESS/RESIDENCE changed. Still agree?</c180>
    <c180 m="2">KYC/US*41 FROM 10 NOT RECEIVED</c180>
    <c180 m="3">PWM/US*41 FROM 10 NOT RECEIVED</c180>
    <c180 m="4">INTRO/US*41 FROM 10 NOT RECEIVED</c180>
    <c182>2</c182>
    <c183>17338_OFFICER__OFS_SEAT</c183>
    <c184>2104271357</c184>
    <c185>17338_OFFICER_OFS_SEAT</c185>
    <c186>GB0010001</c186>
    <c187>1</c187>
</row>"

Python代码:

#!/usr/bin/env python3
import os
import sys
import time
import csv
import cx_Oracle
import xml.etree.ElementTree as ET
import pandas as pd

con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
start = time.time()

SQL = "SELECT RECID,RECORD FROM cust_test2"
#print(SQL)
cur = con.cursor()
cur.prefetchrows = 1000
cur.arraysize = 1000
f = open("D:\Various\myfile21.csv", "w")
writer = csv.writer(f, lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
r = cur.execute(SQL)

#col_names = [row[0] for row in cur.description]

#writer.writerow(col_names)
res = cur.fetchall()
for row in res:

   tree = ET.ElementTree(ET.fromstring(row[1]))
   root = tree.getroot()
   for child in root:
      #print(child.tag, child.text, child.attrib)
      if (child.tag == 'c1') :
         s1 = pd.Series(child.text)
      if (child.tag == 'c3') :
         s3 = pd.Series(child.text)
      if (child.tag == 'c23') :
         s23 = pd.Series(child.text)
      if (child.tag == 'c179') :
         s179 = pd.Series(child.text)
          .......

   df = pd.DataFrame({"c1": s1,
                      "c3": s3,
                      "c23": s23,
                      "c179": s179
                      .......})

   file_name = 'events.csv'
   df.to_csv(file_name, sep='\t')

   ResultSet_Py_List = []
   ora_conn = cx_Oracle.connect('custom/custom@orcl')
   ora_cursor = ora_conn.cursor()
   my_file = open(file_name, 'r', newline='')

   reader = csv.reader(my_file, dialect='excel', delimiter='\t')
   row1 = next(reader)
   for index, row in enumerate(reader):
      print(row)
      ResultSet_Py_List.append(row)
   print(str(len(ResultSet_Py_List)) + ' Records from Source')

   sql_insert = """
      INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....) 
      VALUES (:1,:2,:3,:4,:5,.....)
      """
   ora_cursor.prepare(sql_insert)
   ora_cursor.executemany(None, ResultSet_Py_List)
   ora_conn.commit()

   #writer.writerow(row)
f.close()

elapsed = (time.time() - start)
print(elapsed, "seconds")


考虑使用 Oracle 的 XMLTABLEXMLTYPE 列中提取所需的值。 运行 这样的查询来自 Python 并在两个数据库连接之间直接将值从 cursor.fetchall 传递到 cursor.executemany。这种方法避免了 XML 解析、Pandas 操作和 CSV write/read 步骤。另外,源 SQL 直接在服务器上查询 XML 数据。

SQL (完成所有c节点,根据需要调整数据类型,另存为.sql)

SELECT x.c1, x.c2, ..., x.c187
FROM MY_TABLE, 
     XMLTABLE('/row' 
              PASSING XMLTYPE(MY_TABLE.RECORD) 
              COLUMNS c1 VARCHAR2(50) PATH 'c1',
                      c2 VARCHAR2(50) PATH 'c2',
                      c3 VARCHAR2(50) PATH 'c3',
                      ...
                      c187 VARCHAR2(50) PATH 'c187'
             ) AS x

Python

import cx_Oracle

# READ SQL QUERY
with open('/path/to/myquery.sql', mode='r') as f:
    src_sql = f.read()

# SOURCE CONNECTION
src_con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
src_cur = src_con.cursor() 
src_cur.prefetchrows = 1000
src_cur.arraysize = 1000

# DESTINATION CONNECTION
dst_con = cx_Oracle.connect('custom/custom@orcl') 
dst_cur = dst_con.cursor()

# APPEND QUERY (MAKE SURE COLUMNS ALIGN TO SOURCE QUERY)
dst_sql = """INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....) 
             VALUES (:1,:2,:3,:4,:5,.....)""" 

# FETCH SOURCE DATA
src_cur.execute(src_sql)
src_res = src_cur.fetchall()

# APPEND TO DESTINATION TABLE
dst_cur.prepare(dst_sql)
dst_cur.executemany(None, src_res) 
dst_con.commit()

# CLOSE CURSORS AND CONNECTIONS
src_cur.close(); src_con.close()
dst_cur.close(); dst_con.close()