用 tobytes() 写入二进制数据无法用 Windows 上的软件读取

Question

我正在尝试使用 python 将一些 xyz 点数据写入 .ply 文件。

我在这里使用 this 脚本，它基本上通过 recarry 和 numpy 方法将 pandas DataFrame 写入二进制格式 tobytes():

import pandas as pd
import numpy as np

pc = pd.read_csv('points.txt')

with open('some_file.ply', 'w') as ply:

    ply.write("ply\n")
    ply.write('format binary_little_endian 1.0\n')
    ply.write("comment Author: Phil Wilkes\n")
    ply.write("obj_info generated with pcd2ply.py\n")
    ply.write("element vertex {}\n".format(len(pc)))
    ply.write("property float x\n")
    ply.write("property float y\n")
    ply.write("property float z\n")
    ply.write("end_header\n")

    pc[['x', 'y', 'z']] = pc[['x', 'y', 'z']].astype('f4')

    ply.write(pc[['x', 'y', 'z']].to_records(index=False).tobytes())

这个脚本在我的 Mac 上运行良好，像 CloudCompare 这样的软件可以读取它；然而，当我在 windows 机器上使用相同的脚本时，CloudCompare 可以读取 header 信息，但会乱码二进制内容。

当我将文本文件版本读入 CloudCompare 并输出为二进制文件时，Linux 和 Windows 版本都可以读取它，但文件内容不同。

Here is the version that is produced by the above script, here is the version produce by CloudCompare on Windows and here为原始数据

Answer 1

made_with_code.ply 和 made_with_windows.ply 之间的区别在于，在后者中，所有小数都四舍五入为 2 位小数，如您所见：

with open('windows.ply', 'rb') as f:
    np.core.records.fromfile(f, formats='f4,f4,f4,f4')

使用 tail -c +274 made_with_windows.ply > windows.ply 提取数据部分后。

以下代码生成与 made_with_windows.ply 相同（在数据部分）的文件：

import pandas as pd
import numpy as np

pc = pd.read_csv('points.txt')

with open('made_with_code_new.ply', 'wb') as ply:
    ply.write("ply\n")
    ply.write('format binary_little_endian 1.0\n')
    ply.write("comment Author: Phil Wilkes\n")
    ply.write("obj_info generated with pcd2ply.py\n")
    ply.write("element vertex {}\n".format(len(pc)))
    ply.write("property float x\n")
    ply.write("property float y\n")
    ply.write("property float z\n")
    ply.write("end_header\n")

    pc[['x', 'y', 'z', 'n']] = pc[['x', 'y', 'z', 'n']].round(2).astype('f4')

    ply.write(pc[['x', 'y', 'z', 'n']].to_records(index=False).tobytes())

Answer 2

原来我需要指定打开文件时使用的行结尾：

open(output_name, 'w', newline='\n')

为 Python 3 重写后，文件必须写入两次 - 一次用于 header，一次用于二进制组件，因此新函数如下所示：

import pandas as pd
import numpy as np

pc = pd.read_csv('points.txt')

with open(output_name, 'w', newline='\n') as ply:

    ply.write("ply\n")
    ply.write('format binary_little_endian 1.0\n')
    ply.write("comment Author: Phil Wilkes\n")
    ply.write("obj_info generated with pcd2ply.py\n")
    ply.write("element vertex {}\n".format(len(pc)))
    ply.write("property float x\n")
    ply.write("property float y\n")
    ply.write("property float z\n")
    ply.write("end_header\n")

with open(output_name, 'ab') as ply:
    pc[['x', 'y', 'z']] = pc[['x', 'y', 'z']].astype('f4')
    ply.write(pc[cols].to_records(index=False).tobytes())

用 tobytes() 写入二进制数据无法用 Windows 上的软件读取

writing binary data with tobytes() can not be read with software on Windows

python

binary

numpy

point-clouds

pandas