Python 是否有标准 PTS reader 或解析器？

Question

我有以下文件：

version: 1
n_points:  68
{
55.866278 286.258077
54.784191 315.123248
62.148364 348.908294
83.264019 377.625584
102.690421 403.808995
125.495327 438.438668
140.698598 471.379089
158.435748 501.785631
184.471278 511.002579
225.857960 504.171628
264.555990 477.159805
298.168768 447.523374
332.502678 411.220089
350.641672 372.839985
355.004106 324.781552
349.265206 270.707703
338.314674 224.205227
33.431075 238.262266
42.204378 227.503948
53.939564 227.904931
68.298209 232.202002
82.271511 239.951519
129.480996 229.905585
157.960824 211.545631
189.465597 204.068108
220.288164 208.206246
249.905282 218.863196
110.089281 266.422557
108.368067 298.896910
105.018473 331.956957
102.889410 363.542719
101.713553 379.256535
114.636047 383.331785
129.543556 384.250352
140.033133 375.640569
152.523364 366.956846
60.326871 270.980865
67.198221 257.376350
92.335775 259.211865
102.394658 274.137548
86.227917 277.162353
68.397650 277.343621
165.340638 263.379230
173.385917 246.412765
198.024842 240.895985
223.488685 247.333206
207.218336 260.967007
184.619159 265.379884
122.903148 418.405102
114.539655 407.643816
123.642553 404.120397
136.821841 407.806210
149.926926 403.069590
196.680098 399.302500
221.946232 394.444167
203.262878 417.808844
164.318232 440.472370
145.915650 444.015386
136.436942 442.897031
125.273506 429.073840
124.666341 420.331816
130.710965 421.709666
141.438004 423.161457
155.870784 418.844649
213.410389 396.978046
155.870784 418.844649
141.438004 423.161457
130.710965 421.709666
}

文件扩展名为.pts.

这个文件有一些标准reader吗？

我尝试读取它的代码（从某些 github 下载）是

landmark = np.loadtxt(image_landmarks_path)

失败

{ValueError}could not convert string to float: 'version:'

有道理。

我无法更改文件，想知道我是否必须编写自己的解析器或者这是某种标准吗？

Answer 1

它似乎是一个 2D 点云文件，我认为它叫做 Landmark PTS 格式，我能找到的最接近的 Python 参考是 3D-morphable face model-fitting library issue, which references a sample file that matches yours。大多数 .pts 点云工具都希望使用 3D 文件，因此可能无法直接使用此文件。

所以不，这似乎没有标准 reader；我最接近读取格式的库是 this GitHub repository，但它有缺点：它先将所有数据读入内存，然后再手动将其解析为 Python 浮点值。

但是，格式非常简单（如引用的问题注释），因此您只需使用 numpy.loadtxt() 即可读取数据；简单的方法是将所有这些 non-data 行命名为注释：

def read_pts(filename):
    return np.loadtxt(filename, comments=("version:", "n_points:", "{", "}"))

或者，如果您不确定一堆此类文件的有效性并且希望确保只读取有效文件，那么您可以 pre-process 文件来读取 header（包括点数和版本验证，允许comments and image size info）：

from pathlib import Path
from typing import Union
import numpy as np

def read_pts(filename: Union[str, bytes, Path]) -> np.ndarray:
    """Read a .PTS landmarks file into a numpy array"""
    with open(filename, 'rb') as f:
        # process the PTS header for n_rows and version information
        rows = version = None
        for line in f:
            if line.startswith(b"//"):  # comment line, skip
                continue
            header, _, value = line.strip().partition(b':')
            if not value:
                if header != b'{':
                    raise ValueError("Not a valid pts file")
                if version != 1:
                    raise ValueError(f"Not a supported PTS version: {version}")
                break
            try:
                if header == b"n_points":
                    rows = int(value)
                elif header == b"version":
                    version = float(value)  # version: 1 or version: 1.0
                elif not header.startswith(b"image_size_"):
                    # returning the image_size_* data is left as an excercise
                    # for the reader.
                    raise ValueError
            except ValueError:
                raise ValueError("Not a valid pts file")

        # if there was no n_points line, make sure the closing } line
        # is not going to trip up the numpy reader by marking it as a comment
        points = np.loadtxt(f, max_rows=rows, comments="}")

    if rows is not None and len(points) < rows:
        raise ValueError(f"Failed to load all {rows} points")
    return points

除了提供完整的测试套件外，该功能已尽我所能 production-ready。

这使用 n_points: 行告诉 np.loadtxt() 要读取多少行，并将文件位置向前移动到刚好通过 { 开启符。如果 version: 1 行不存在，或者 header 中除了 version: 1 和 n_points: <int> 之外，它还会以 ValueError 退出。

两者都生成 float64 值的 68x2 矩阵，但应该能够处理任何维度的点。

回到那个 EOS 库参考，他们的 demo code to read the data hand-parses the lines, also by reading all lines into memory first. I also found this Facebook Research PTS dataset loading code（对于每行 3 个值的 .pts 文件），这与手册一样。

Python 是否有标准 PTS reader 或解析器？

Does Python have a standard PTS reader or parser?

python

file

point-clouds