将 DataFrame 写入文件的更多 pythonic 方式

Question

我有一个很大的坐标 DataFrame，我必须将其内容写入另一个程序的输入文件。现在我有一个嵌套的、相当复杂的 for 循环和许多 .write() 语句——写文件需要很多时间。我找不到也想不出更聪明的方法。

数据帧

type	Uiso	x	y	z
H	0.0320	0.257510	0.254363	0.021930
H	0.0330	0.255228	0.163941	0.038431
H	0.0330	0.255228	0.163941	0.038431
C	0.0278	0.122879	0.207314	0.027545
H	0.0320	0.534974	0.254363	0.021930

输入

第一行：原子类型。
第二行：这些原子类型在结构中的个数，原子数（H：1，C：6等），然后是占有率（总是1），然后是Uiso。
其余行：x y z 坐标。

对 DataFrame 中的每个原子类型重复。

  H
    4 1  1.000 0.0320
      0.257510  0.254363  0.021930
      0.255228  0.163941  0.038431
      0.255228  0.163941  0.038431
      0.534974  0.254363  0.021930
  C
    1 6  1.000 0.0278
      0.122879  0.207314  0.027545

代码：

f = open(file_name + ".inp", "w")

for type in structure["type"].unique():
    for Uiso in structure[structure["type"] == type]["Uiso"].unique():
        f.write("  " + type + "\n")
        f.write("    " + str(len(list(structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)][["x", "y", "z"]].iterrows()))) + " ")
        f.write(str(atomic_numbers[type]))
        f.write(str("  1.000 "))
        f.write(str(Uiso) + "\n")
        for coord in structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)].itertuples():
            f.write("      "
                 + str(coord.x) + " "
                 + str(coord.y) + " "
                 + str(coord.z) + "\n")
f.close()

问题

有没有办法减少循环次数？也许一次编写 DataFrame 的整个排序部分？

Answer 1

有很多方法可以解决这个问题，例如制作一个模板字符串，然后用 format() 填充它，但是你的情况可能足够紧凑，可以即时完成，可以这么说。

例如，这或多或少地完成了您想要的 DataFrame df:

import numpy as np

with open('data.inp', 'wt') as f:
    for atom, group in df.groupby('type'):
        
        # Write the header info using an f-string.
        f.write(f"{atom}\n  {len(group)} 1  1.000 {group['Uiso'].iloc[0]:.4f}\n")
        
        # Write the data table using savetxt for fixed-width columns.
        data = group.loc[:, ['x', 'y', 'z']]
        np.savetxt(f, data, fmt='%10.6f', newline='\n')

这会产生：

C
  1 1  1.000 0.0278
  0.122879   0.207314   0.027545
H
  4 1  1.000 0.0320
  0.257510   0.254363   0.021930
  0.255228   0.163941   0.038431
  0.255228   0.163941   0.038431
  0.534974   0.254363   0.021930

一些注意事项：

始终使用上下文进行文件读写（with 块）。这是最安全的做法。
我尝试使用 pd.to_csv()，但我认为它不能做固定宽度的列，而 np.savetxt() 可以。
我把每条记录的第二行的信息guessed/made排序；你可能需要改变其中的一些。
您可以通过使用一些适当的键（例如 table 来查找原子序数或其他东西）在 groupby 周围抛出 sorted() 来更改排序顺序。

将 DataFrame 写入文件的更多 pythonic 方式

More pythonic way to write DataFrame to a file

python

file-writing

dataframe

pandas

数据帧

输入

代码：

问题