txt 到 csv python pandas

txt to csv with python pandas

我有一个 txt 文件,我想以 csv 格式读取和导出,但我有一个问题。

txt 文件中的数据如下所示 |

Sales Organization|Distribution Channel|Sold-To #|Sold-To Name  |Ship-To #|Ship-To Name             |Mark-For #|Mark-For Name|Z1 : Sales Rep|Z1 : Sales Rep (Name)|Order Number|Sales Doc Type|Order Reason|PO Number|PO Type|Header Department|Delivery Block (H)|Billing Block (H)|Doc Date  |RDD (H)   |Cancel Date (H)|RDD (L)   |Cancel date (L)|Division|Plant|Material  |Sales Doc Item|Size  |Schedule Line|Size Confirm Date|Item Category|Rej.Reason (SL)|Order Qty (SL)|Confirmed Qty (SL)|Unconfirmed Qty (SL)|Cancelled Qty (SL)|Open Qty (SL)|Reserved Qty (SL)|Fixed Qty (SL)|% Allocation (SL)|Delivered Qty (SL)|PGI Qty (SL)|Invoiced Qty (SL)|Net Unit Price|Confirmed Net Value (SL)|Dollars Shipped (SL)|Currency|% Shipped/Allocated (SL)|Delivery Block (SL)|Sales UOM|Credit Limit Status Text              |EAN/UPC      |Customer Material|
|   EU01              |10                  |10026276 | EU SARL|20056417 |Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|36  32|            1|14.02.2022       |ZTAN         |               |        1,000 |            1,000 |              0,000 |            0,000 |       0,000 |           0,000 |        0,000 |           0,000 |            1,000 |      0,000 |           0,000 |       41,600 |                 41,600 |              0,000 |EUR     |                100,000 |                   |EA       |Credit check was executed, document OK|5400898540995|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|33  34|            2|14.02.2022       |ZTAN         |               |        1,000 |            1,000 |              0,000 |            0,000 |       0,000 |           0,000 |        0,000 |           0,000 |            1,000 |      0,000 |           0,000 |       41,600 |                 41,600 |              0,000 |EUR     |                100,000 |                   |EA       |Credit check was executed, document OK|5400898540926|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|32  32|            3|14.02.2022       |ZTAN         |P6             |        2,000 |            0,000 |              0,000 |            2,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       41,600 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400898508124|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |85862-0041|           530|29  - |            1|14.02.2022       |ZTAN         |P6             |        1,000 |            0,000 |              0,000 |            1,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       21,100 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970111273|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith|1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |00501-3199|            10|36  34|            1|14.02.2022       |ZTAN         |X9             |       17,000 |            0,000 |              0,000 |           17,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       47,800 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970332180|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-4432|            20|40  32|            1|14.02.2022       |ZTAN         |J2             |        2,000 |            0,000 |              0,000 |            2,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       41,300 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400898076951|                 |
|   EU01              |10                  |10026276 |EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-5115|            30|36  32|            1|14.02.2022       |ZTAN         |P6             |        5,000 |            0,000 |              0,000 |            5,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       47,800 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970262012|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-5155|            40|28  30|            1|14.02.2022       |ZTAN         |X9             |        1,000 |            0,000 |              0,000 |            1,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       56,500 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970254963|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         

我想在 csv 文件中分别显示每一列。现在,如您所见,列由 | 分隔。例如 - Sales Organization 应该是 header 并且 EU01 应该是它的值等等。

df =pd.read_csv('1.txt', sep='delimiter', header= None, engine='python')
df =df.iloc[3:]

df.to_csv(path + '123.csv', index=False, header=True)

在您提供的示例 txt 文件中。每行的开头和结尾都有一个 |。所以你需要在阅读 csv 之前删除它。否则,它会给你 ParseError。解决此问题后,您可以像这样使用 sep='|'

df =pd.read_csv('1.txt', sep='|', header= None)

嗯,这是我所做的:

我们称这个文件为 data.txt。我添加了一个额外的'|' Sales Organisation

之前

现在我把白色的 space 去掉:

with open('data.txt', 'r') as f:
    lines = f.readlines()

Stripped = [line.replace(' ', '') for line in lines]

with open('data.txt', 'w') as f:
    f.writelines(Stripped)

然后我们得到 clean-looking data.txt:

|SalesOrganization|DistributionChannel|Sold-To|Sold-ToName|Ship-To|Ship-ToName|Mark-For|Mark-ForName|Z1:SalesRep|Z1:SalesRep(Name)|OrderNumber|SalesDocType|OrderReason|PONumber|POType|HeaderDepartment|DeliveryBlock(H)|BillingBlock(H)|DocDate|RDD(H)|CancelDate(H)|RDD(L)|Canceldate(L)|Division|Plant|Material|SalesDocItem|Size|ScheduleLine|SizeConfirmDate|ItemCategory|Rej.Reason(SL)|OrderQty(SL)|ConfirmedQty(SL)|UnconfirmedQty(SL)|CancelledQty(SL)|OpenQty(SL)|ReservedQty(SL)|FixedQty(SL)|%Allocation(SL)|DeliveredQty(SL)|PGIQty(SL)|InvoicedQty(SL)|NetUnitPrice|ConfirmedNetValue(SL)|DollarsShipped(SL)|Currency|%Shipped/Allocated(SL)|DeliveryBlock(SL)|SalesUOM|CreditLimitStatusText|EAN/UPC|CustomerMaterial|
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3632|1|14.02.2022|ZTAN||1,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|1,000|0,000|0,000|41,600|41,600|0,000|EUR|100,000||EA|Creditcheckwasexecuted,documentOK|5400898540995||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3334|2|14.02.2022|ZTAN||1,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|1,000|0,000|0,000|41,600|41,600|0,000|EUR|100,000||EA|Creditcheckwasexecuted,documentOK|5400898540926||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3232|3|14.02.2022|ZTAN|P6|2,000|0,000|0,000|2,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|41,600|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400898508124||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|85862-0041|530|29-|1|14.02.2022|ZTAN|P6|1,000|0,000|0,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|21,100|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970111273||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|00501-3199|10|3634|1|14.02.2022|ZTAN|X9|17,000|0,000|0,000|17,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|47,800|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970332180||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-4432|20|4032|1|14.02.2022|ZTAN|J2|2,000|0,000|0,000|2,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|41,300|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400898076951||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-5115|30|3632|1|14.02.2022|ZTAN|P6|5,000|0,000|0,000|5,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|47,800|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970262012||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-5155|40|2830|1|14.02.2022|ZTAN|X9|1,000|0,000|0,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|56,500|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970254963||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646

现在我只是将其读入 pandas 并删除第一列:

df = pd.read_csv('data.txt', sep='|'  , engine='python')
df = df.drop(['Unnamed: 0'], axis = 1)

这是输出!

  SalesOrganization  DistributionChannel  ...  CustomerMaterial Unnamed: 54
0              EU01                   10  ...               NaN         NaN
1              EU01                   10  ...               NaN         NaN
2              EU01                   10  ...               NaN         NaN
3              EU01                   10  ...               NaN         NaN
4              EU01                   10  ...               NaN         NaN
5              EU01                   10  ...               NaN         NaN
6              EU01                   10  ...               NaN         NaN
7              EU01                   10  ...               NaN         NaN
8              EU01                   10  ...               NaN         NaN

如果愿意,您现在可以将其转换为 CSV:

df.to_csv('data.csv', index=False)