将类似系列的数据文件导入 pandas

Question

这是数据文件的示例：

 =====
 name          aaa
 place         paaa
 date          Thu Oct 1 12:02:03 2015
 load_status   198
 add_name      naaa
 [---blank line---]
 =====
 name          bbb
 place         pbbb
 date          Thu Oct 3 21:20:36 2015
 load_status   2000.327
 add_name      nbbb
 [---blank line---]

在一个文件中可能有数百条这样的记录。

我想要一个像这样的 pandas 对象：

   name | place | date                    | load_status | add_name
   ---------------------------------------------------------------
   aaa  | paaa  | Thu Oct 1 12:02:03 2015 | 198         | naaa
   bbb  | pbbb  | Thu Oct 3 21:20:36 2015 | 2000.327    | nbbb

每条记录的字段数相同：所以所有记录都有一些'name'、'place'等

我可以用 "bash+grep+awk" 转置文件，然后将其读取为 csv，但对于只有 Python 和 Windows 的用户来说，这并不实用。使用 Python 转置文件，然后将其读取为 csv 看起来有点矫枉过正，因为我预计 Pandas 应该能够以某种方式处理这种情况。

我想到了 Series+dtypes 和 read_table - 但无法让它们为我工作。

Answer 1

Python 中有一个简单的循环。之后你必须做一些清洁，然后进行一些检查，但这应该让你开始。

import pandas as pd

records = []
this_record = {}
with open(input_fn, 'r') as f:
    for line in f:
        if line.strip() == '':
            records.append(this_record)
            this_record = {}
            continue
        elif line.startswith('='):
            continue
        line = line.split()
        this_record[line[0]] = ' '.join(line[1:]).strip()

df = pd.DataFrame.from_records(records)

将类似系列的数据文件导入 pandas

import series-like data file into pandas

python

text

series

dataframe

pandas