计算除以特殊字符开头的行之外的总行数

Question

我想从文本文件中获取原子数。这个文本文件以几行 header 开头，有时它可能会添加一些额外的信息行，这些信息行也以特殊字符开头。示例文本文件如下所示：

% site-data vn=3.0
#                        pos
Ga        0.0000000   0.0000000   0.0000000
As        0.2500000   0.2500000   0.2500000

我的方法是计算总行数和以特殊字符开头的行数，所以这是我的尝试：

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        x = len(site.readlines())
        for line in site.readlines():
            if '#' in line or '%' in line:
                count +=1
return x-count

此函数的问题在于，定义了 x（总行数）后，计数器（以特殊字符开头的行数）returns 0。如果我删除该行，它作品。现在，我可以将这两个分为两个功能，但我相信这应该可以正常工作，我想知道我做错了什么。

Answer 1

您面临的问题是 .readlines() 在执行时消耗了整个文件。如果你再次调用它，没有任何结果，因为它已经在文件的末尾。

解决方案是先将site.readlines()赋值给一个变量，然后更改以下两行以引用该变量。这样，您只需调用一次。

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        lines = site.readlines()
        x = len(lines)
        for line in lines:
            if '#' in line or '%' in line:
                count +=1
    return x - count

Answer 2

if '#' in line or '%' in line: 将检查字符是否在行中的任何位置。使用 startswith 代替·

if line.startswith(('#', '%')):

现在关于计数的方法，也可以只在不是行开头的字符才加计数，不需要知道总数提前行，不需要消耗所有行：

if not line.startswith(('#', '%')):
    counter += 1

那你可以直接在最后打印计数器

完整代码：

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        for line in site.readlines():
            if not line.startswith(('#', '%')):
                count +=1
    return count

Answer 3

改用readline

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        for line in site.readline():
            if '#' not in line and '%' not in line:
                count +=1
    return count

作为mozway的回答，startswith是一个更好的解决方案，所以代码可以这样：

from pathlib import Path
from typing import Union

IGNORE = ('#', '%')

def get_atom_number(filename: str = sitefile, ignore_chars: Union[str, tuple] = IGNORE) -> int:
    '''Count how many lines in filename that not startswith ignore_chars'''
    return len([1 for i in Path(filename).read_text().splitlines() if not i.startswith(ignore_chars)])

Answer 4

第 4 行代码中的第一个调用 site.readlines() 将文件光标移动到末尾。所以第 5 行的第二次调用 site.readlines() 只得到一个空列表。您可以试试下面的代码，它将调用 site.readlines() 的结果保存到变量 lines 中。我认为它会解决您的问题。

def get_atom_number():
    count = 0
    with open(sitefile,'r') as site:
        lines = site.readlines()
        x = len(lines)
        for line in lines:
            if '#' in line or '%' in line:
                count +=1
    return x - count

计算除以特殊字符开头的行之外的总行数

Counting the total number of lines except the ones that start with a special character

python

counting