使用 SQLite 了解 PonyORM 中的内存消耗

Question

我有以下代码将大型 CSV 文件（超过 350 万行）加载到 SQLite 数据库中。

该程序工作正常，但它似乎没有释放内存，所以虽然运行我可以使用命令 top 看到程序的内存大小如何增长，直到它耗尽所有可用的服务器内存和该程序在未插入所有行的情况下被终止。

我的理解是，包含的 db.commit()（每次我们开始在 CSV 中加载新月份时执行）将释放所有创建的 Candlestick 实例（我认为，这些实例正在制造内存增长），但它不会那样做。

为什么会发生这种情况以及可以在代码中更正哪些内容以使其在没有内存泄漏的情况下运行？

# -*- coding: utf-8 -*-

# Load CSV Data into SQLite Database

from decimal import *
from datetime import datetime
from pytz import timezone

from pony.orm import *
import csv

# Input parameters
csv_filename = 'dax-1m.csv'
csv_timeframe = '1m'
csv_delimiter = ';'
csv_quotechar = '"'
csv_timezone = timezone('America/New_York')
db_filename = 'dax.db'
db_timezone = timezone('Europe/Berlin')

# Open/Create database
db = Database()

# Data Model
class Candlestick(db.Entity):
    timeframe = Required(unicode)
    timestamp = Required(datetime)
    open = Required(Decimal, precision=12, scale=6)
    high = Required(Decimal, precision=12, scale=6)
    low = Required(Decimal, precision=12, scale=6)
    close = Required(Decimal, precision=12, scale=6)
    volume = Required(Decimal, precision=12, scale=6)

db.bind(provider='sqlite', filename=db_filename, create_db=True)
db.generate_mapping(create_tables=True)    

# Loader class
class Loader():
    def load(self):
        rowcount = 0;
        current_year = -1;
        current_month = -1;
        with open(csv_filename, newline='') as csvfile:
            r = csv.reader(csvfile, delimiter=csv_delimiter, quotechar=csv_quotechar)
            with db_session:
                for row in r:

                    _year = int(row[0][-4:])
                    _month = int(row[0][3:-5])
                    _day = int(row[0][:2])
                    _hour = int(row[1][:2])
                    _minute = int(row[1][3:5])
                    csv_dt = datetime(_year, _month, _day, _hour, _minute)
                    db_dt = csv_timezone.localize(csv_dt).astimezone(db_timezone)

                    Candlestick(
                        timeframe=db_timezone.zone, 
                        timestamp=db_dt,
                        open=row[2], 
                        high=row[3],
                        low=row[4], 
                        close=row[5],
                        volume=row[6]
                    )

                    rowcount+=1

                    if(_year != current_year or _month != current_month):
                        db.commit()
                        current_year = _year
                        current_month = _month
                        print('Loading data for ' + str(current_year) + ' ' + str(current_month) + ' ...')
                        print('Loaded ' + str(rowcount) + ' registers.')

ldr=Loader()
ldr.load();

Answer 1

这里没有内存泄漏。 Pony 在离开 db_session 作用域时清除缓存在这里你可以看到更多关于这个的信息 https://docs.ponyorm.com/transactions.html#working-with-db-session.

尤其是这个：

当会话结束时，它会执行以下操作：

清除身份映射缓存

您需要缩小 db_session 的范围。另一种选择是在创建 N 个对象后执行 commit()，然后执行 rollback() 以清除缓存：

with db_session(strict=True):
    for i, row in enumerate(r):
        <do some work>
        if i % 10000 == 0:
            commit()  # save the changes
            rollback()  # clear the cache

在此处查看有关 rollback() 发生的情况的更多信息：https://docs.ponyorm.com/transactions.html#db-session-cache

使用 SQLite 了解 PonyORM 中的内存消耗

Understanding memory consumption in PonyORM with SQLite

python

sqlite

orm

memory-leaks

ponyorm