逐行读取 XML 而不将整个文件加载到内存
Read XML line by line without loading whole file to memory
这是我的 XML:
的结构
<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="756" ViewCount="63468" Body="<p>I want to use a <code>Track-Bar</code> to change a <code>Form</code>'s opacity.</p>
<p>This is my code:</p>
<pre class="lang-cs prettyprint-override"><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I build the application, it gives the following error:</p>
<blockquote>
<pre class="lang-none prettyprint-override"><code>Cannot implicitly convert type decimal to double
</code></pre>
</blockquote>
<p>I have tried using <code>trans</code> and <code>double</code>, but then the <code>Control</code> doesn't work. This code worked fine in a past VB.NET project.</p>
" OwnerUserId="8" LastEditorUserId="3072350" LastEditorDisplayName="Rich B" LastEditDate="2021-02-26T03:31:15.027" LastActivityDate="2021-11-15T21:15:29.713" Title="How to convert a Decimal to a Double in C#?" Tags="<c#><floating-point><type-conversion><double><decimal>" AnswerCount="12" CommentCount="4" FavoriteCount="59" CommunityOwnedDate="2012-10-31T16:42:47.213" ContentLicense="CC BY-SA 4.0" />
<row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="313" ViewCount="22477" Body="<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <code>percentage-based width</code> on the child <code>div</code>, it collapses to <code>0 width</code> on IE7, but not on Firefox or Safari.</p>
<p>If I use <code>pixel width</code>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>
<ol>
<li>Is there something I'm missing here?</li>
<li>Is there an easy fix for this besides the <code>pixel-based width</code> on the child?</li>
<li>Is there an area of the CSS specification that covers this?</li>
</ol>
" OwnerUserId="9" LastEditorUserId="9134576" LastEditorDisplayName="user14723686" LastEditDate="2021-01-29T18:46:45.963" LastActivityDate="2021-01-29T18:46:45.963" Title="Why did the width collapse in the percentage width child element in an absolutely positioned parent on Internet Explorer 7?" Tags="<html><css><internet-explorer-7>" AnswerCount="7" CommentCount="0" FavoriteCount="13" ContentLicense="CC BY-SA 4.0" />
</posts>
我可以一个一个地加载每个 row
而无需将整个 XML 文件加载到内存中吗?例如打印所有的标题
您可以尝试这样的操作:
while line:= file.readline():
提供 XML 文件的结构与示例中所示的完全相同,然后 BeautifulSoup 可用于解析相关行。像这样:
from bs4 import BeautifulSoup as BS
with open('my.xml') as xml:
for line in map(str.strip, xml):
if line.startswith('<row'):
soup = BS(line, 'lxml')
if row := soup.find('row'):
if title := row.get('title'):
print(title)
是的,你可以使用open(),它会return一个文件对象而不是将文件内容读入RAM。所以你想做这样的事情:
with open('file_name') as file:
for row in file:
print(row)
XML 中的“行”非常无关紧要;相关单位是元素、属性、开始标签、结束标签等。
流式解析器(通常称为 SAX 解析器,尽管严格来说 SAX 是 Java API)将递增地将文档交付给应用程序,不是一次一行,而是一次一个句法单位。
参见示例Python SAX Parser
这是我的 XML:
的结构<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="756" ViewCount="63468" Body="<p>I want to use a <code>Track-Bar</code> to change a <code>Form</code>'s opacity.</p>
<p>This is my code:</p>
<pre class="lang-cs prettyprint-override"><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I build the application, it gives the following error:</p>
<blockquote>
<pre class="lang-none prettyprint-override"><code>Cannot implicitly convert type decimal to double
</code></pre>
</blockquote>
<p>I have tried using <code>trans</code> and <code>double</code>, but then the <code>Control</code> doesn't work. This code worked fine in a past VB.NET project.</p>
" OwnerUserId="8" LastEditorUserId="3072350" LastEditorDisplayName="Rich B" LastEditDate="2021-02-26T03:31:15.027" LastActivityDate="2021-11-15T21:15:29.713" Title="How to convert a Decimal to a Double in C#?" Tags="<c#><floating-point><type-conversion><double><decimal>" AnswerCount="12" CommentCount="4" FavoriteCount="59" CommunityOwnedDate="2012-10-31T16:42:47.213" ContentLicense="CC BY-SA 4.0" />
<row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="313" ViewCount="22477" Body="<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <code>percentage-based width</code> on the child <code>div</code>, it collapses to <code>0 width</code> on IE7, but not on Firefox or Safari.</p>
<p>If I use <code>pixel width</code>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>
<ol>
<li>Is there something I'm missing here?</li>
<li>Is there an easy fix for this besides the <code>pixel-based width</code> on the child?</li>
<li>Is there an area of the CSS specification that covers this?</li>
</ol>
" OwnerUserId="9" LastEditorUserId="9134576" LastEditorDisplayName="user14723686" LastEditDate="2021-01-29T18:46:45.963" LastActivityDate="2021-01-29T18:46:45.963" Title="Why did the width collapse in the percentage width child element in an absolutely positioned parent on Internet Explorer 7?" Tags="<html><css><internet-explorer-7>" AnswerCount="7" CommentCount="0" FavoriteCount="13" ContentLicense="CC BY-SA 4.0" />
</posts>
我可以一个一个地加载每个 row
而无需将整个 XML 文件加载到内存中吗?例如打印所有的标题
您可以尝试这样的操作:
while line:= file.readline():
提供 XML 文件的结构与示例中所示的完全相同,然后 BeautifulSoup 可用于解析相关行。像这样:
from bs4 import BeautifulSoup as BS
with open('my.xml') as xml:
for line in map(str.strip, xml):
if line.startswith('<row'):
soup = BS(line, 'lxml')
if row := soup.find('row'):
if title := row.get('title'):
print(title)
是的,你可以使用open(),它会return一个文件对象而不是将文件内容读入RAM。所以你想做这样的事情:
with open('file_name') as file:
for row in file:
print(row)
XML 中的“行”非常无关紧要;相关单位是元素、属性、开始标签、结束标签等。
流式解析器(通常称为 SAX 解析器,尽管严格来说 SAX 是 Java API)将递增地将文档交付给应用程序,不是一次一行,而是一次一个句法单位。
参见示例Python SAX Parser