ASC 文件在添加到 df 时不保留空列 Python

Question

我有一大堆 ASC 文件要从中提取数据。我遇到的问题是，当我将这些文件加载到 df 中时，某些列有没有数据的空行 - 它用所有数据填充第一列，并在末尾添加 nans ... 就像这样:

一个| b| c

1 | 2 |楠

当我希望它成为：

一个 |乙 | c

1 |nan|2

（我不知道如何在这里制作 table 来挽救我的生命）但是在没有数据的地方我希望它保留 space。我的部分代码说分隔符是任何 space 有两个以上的白色 spaces 所以我可以保留其中有一个 space 的 headers，我认为这个是导致问题的原因，但我不确定如何解决。我尝试使用 astropy.io 打开文件并确定分隔符，但我收到列数与数据列不匹配的错误。这是我拥有的文件的一般外观的图像，因此您可以看到缺少字符定界符和空列。

starting_words = ['Core no.', 'Core No.','Core','Core  no.']       
data = []
file_paths = []

for file in filepaths:
with open(file) as f:
   
    for i, l in enumerate(f):
        if l.startswith(tuple(starting_words)):
            
            df = (pd.read_csv(file,sep = '\s{2,}', engine = 'python', skiprows = i))
            file_paths.append((file.stem + file.suffix))
            df.insert(0,'Filepath', file)

            data += [df]
            break

这是我用来打开文件并将 header 单词放在一起的脚本，我从来没有得到运行的天文信息 - 我要么得到列不匹配错误或者它无法确定文件 format.Also，此代码有 skiprows 部分，因为文件顶部都有随机注释，我不希望在我的数据框中出现这些注释。

Answer 1

您的数据看起来很好，您可以尝试使用 Pandas fwf 来读取具有 fixed-width 格式行的文件。如果 fwf 的推断对您来说不够好，您可以使用参数 colspecs.

手动描述每一行的 fixed-width 字段的范围

样本

Core no.   Depth       Depth  Perm  Porosity  Saturations Oil
              ft           m    mD         %                %

       1  5516.0    1681.277            40.0              1.0
       2  5527.0    1684.630            39.0              16.0
       3  5566.0    1696.517   508      37.0              4.0
          5571.0    1698.041   105      33.0              8.0
       6  5693.0    1735.226            44.0              16.0
          5702.0    1737.970   4320     35.0              31.0
       9  5686.0    1733.093   2420     33.0              26.0

df = pd.read_fwf('sample.txt', skiprows=2, header=None)
df.columns=['Core no.', 'Depth ft', 'Depth m' , 'Perm mD', 'Porosity%', 'Saturations Oil%']
print(df)

来自 df

的输出

   Core no.  Depth ft   Depth m  Perm mD  Porosity%  Saturations Oil%
0       1.0    5516.0  1681.277      NaN       40.0               1.0
1       2.0    5527.0  1684.630      NaN       39.0              16.0
2       3.0    5566.0  1696.517    508.0       37.0               4.0
3       NaN    5571.0  1698.041    105.0       33.0               8.0
4       6.0    5693.0  1735.226      NaN       44.0              16.0
5       NaN    5702.0  1737.970   4320.0       35.0              31.0
6       9.0    5686.0  1733.093   2420.0       33.0              26.0

ASC 文件在添加到 df 时不保留空列 Python

ASC files not preserving empty columns when added to df Python

python

ascii

dataframe

pandas