如何将 SAP .txt 提取转换为 .csv 文件

Question

我有一个 .txt 文件，如下例所示。我想将其转换为 .csv table，但我没有取得太大的成功。

Mack3                                            Line Item Journal                                        Time 14:22:33     Date  03.10.2015
Panteni    Ledger 1L                                                                                    TGEPIO00/CANTINAOAS Page      20.001
--------------------------------------------------------------------------------------------------------------------------------------------
|    Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account   |User Name   |LCurr|      Amount in LC|Tx|Assignment        |S|
|------------------------------------------------------------------------------------------------------------------------------------------|
|    07.01.2014|07.02.2014|4919005298| 36|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             0,85 |  |20140107          | |
|    07.01.2014|07.02.2014|4919065298| 29|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             2,53 |  |20140107          | |
|    07.01.2014|07.02.2014|4919235298| 30|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            30,00 |  |20140107          | |
|    07.01.2014|07.02.2014|4119005298| 32|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             1,00 |  |20140107          | |
|    07.01.2014|07.02.2014|9019005298| 34|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            11,10 |  |20140107          | |
|------------------------------------------------------------------------------------------------------------------------------------------|

有问题的文件是 SAP 报告的结构。通过 python 练习并查看其他帖子，我发现了这段代码：

    with open('file.txt', 'rb') as f_input:
        for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
            header = [cols.strip() for cols in next(csv.reader(StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
            break
    with open('file.txt', 'rb') as f_input, open(str(ii + 1) + 'output.csv', 'wb') as f_output:
        csv_output = csv.writer(f_output)
        csv_output.writerow(header)
        for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
            csv_input = csv.reader(StringIO(line), delimiter='|', skipinitialspace=True)
            csv_output.writerow(csv_input)

不幸的是，它不适用于我的情况。事实上，它会创建空的 .csv 文件，并且似乎无法正确读取 csv_input.

任何可能的解决方案？

Answer 1

一旦我们过滤掉几行，即不以管道符号 '|' 后跟 space ' ' 开头的行，您的输入文件就可以被视为 CSV ，这会给我们留下这个：

|    Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account   |User Name   |LCurr|      Amount in LC|Tx|Assignment        |S|
|    07.01.2014|07.02.2014|4919005298| 36|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             0,85 |  |20140107          | |
|    07.01.2014|07.02.2014|4919065298| 29|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             2,53 |  |20140107          | |
|    07.01.2014|07.02.2014|4919235298| 30|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            30,00 |  |20140107          | |
|    07.01.2014|07.02.2014|4119005298| 32|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |             1,00 |  |20140107          | |
|    07.01.2014|07.02.2014|9019005298| 34|07.01.2019|    |81|  |  |60532640  |tARFooWMOND |EUR  |            11,10 |  |20140107          | |

您的输出主要是空的，因为 x[1].isalpha() 检查此数据永远不会为真。每行位置 1 的字符始终是 space，而不是字母。

不需要多次打开输入文件，我们可以一次读取、过滤和写入输出：

import csv

ii = 0

with open('file.txt', 'r', encoding='utf8', newline='') as f_input, \
     open(str(ii + 1) + 'output.csv', 'w', encoding='utf8', newline='') as f_output:

    input_lines = filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] == ' ', f_input)

    csv_input = csv.reader(input_lines, delimiter='|')
    csv_output = csv.writer(f_output)

    for row in csv_input:
        csv_output.writerow(col.strip() for col in row[1:-1])

备注：

您应该不在读取文本文件时使用二进制模式。分别使用r和w模式，明确声明文件编码。选择适合您的文件的编码。
要使用 csv 模块，请使用 newline='' 打开文件（让 csv 模块选择正确的行结尾）
您可以使用行尾的 \ 在 with 语句中包装多个文件。
StringIO完全没有必要。
我没有使用 skipinitialspace=True，因为某些列的末尾也有 space。因此，我在写入行时对每个值手动调用 .strip()。
[1:-1] 是去除多余空列所必需的（输入中第一个 | 之前和最后一个 | 之后）

输出如下

Pstng Date,Entry Date,DocumentNo,Itm,Doc..Date,BusA,PK,SG,Sl,Account,User Name,LCurr,Amount in LC,Tx,Assignment,S
07.01.2014,07.02.2014,4919005298,36,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"0,85",,20140107,
07.01.2014,07.02.2014,4919065298,29,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"2,53",,20140107,
07.01.2014,07.02.2014,4919235298,30,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"30,00",,20140107,
07.01.2014,07.02.2014,4119005298,32,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"1,00",,20140107,
07.01.2014,07.02.2014,9019005298,34,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"11,10",,20140107,

如何将 SAP .txt 提取转换为 .csv 文件

How to convert a SAP .txt extraction into a .csv file

sap

python-3.x