如何分隔从文本文件读取的数据行？顾客带着他们的订单

Question

我将这些数据保存在一个文本文件中。（没有我为清楚起见添加的间距）

我正在使用 Python3:

orders = open('orders.txt', 'r')
lines = orders.readlines()

我需要遍历包含所有数据行的 lines 变量，并按照我的间隔分隔 CO 行。 CO 是客户，每个 CO 下面的行是客户下的订单。

如果您查看 CO 字符串的索引 [7-9]，CO 行会告诉我们存在多少行订单。我在下面对此进行了说明。

CO77812002D10212020       <---(002)
125^LO917^11212020.      <----line 1
235^IL993^11252020       <----line 2 

CO77812002S10212020
125^LO917^11212020
235^IL993^11252020

CO95307005D06092019    <---(005)
194^AF977^06292019    <---line 1 
72^L223^07142019       <---line 2
370^IL993^08022019    <---line 3
258^Y337^07072019     <---line 4
253^O261^06182019     <---line 5

CO30950003D06012019
139^LM485^06272019
113^N669^06192019
249^P530^07112019
CO37501001D05252020
479^IL993^06162020

我想到了一种蛮力的方法，但它不适用于更大的数据集。

如有任何帮助，我们将不胜感激！

Answer 1

您可以使用 fileinput (source) to "simultaneously" read and modify your file. In fact, the in-place functionality that offers to modify a file while parsing it is implemented through a second backup file. Specifically, as stated here:

Optional in-place filtering: if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (...) by default, the extension is '.bak' and it is deleted when the output file is closed.

因此，您可以按照指定的方式格式化您的文件：

import fileinput

with fileinput.input(files = ['orders.txt'], inplace=True) as orders_file:
    for line in orders_file:
        if line[:2] == 'CO':    # Detect customer line
            orders_counter = 0
            num_of_orders = int(line[7:10])    # Extract number of orders
        else:
            orders_counter += 1
            # If last order for specific customer has been reached
            # append a '\n' character to format it as desired
            if orders_counter == num_of_orders:
                line += '\n'
        # Since standard output is redirected to the file, print writes in the file
        print(line, end='')

注意: 假设包含订单的文件格式为 exactly您指定的方式：

CO...
(order_1)
(order_2)
...
(order_i)
CO...
(order_1)
...

Answer 2

这达到了我希望完成的目的！

tot_customers = []

with open("orders.txt", "r") as a_file:
customer = []
for line in a_file:
  stripped_line = line.strip()

  if stripped_line[:2] == "CO":
      customer.append(stripped_line)
      print("customers: ", customer)
      orders_counter = 0
      num_of_orders = int(stripped_line[7:10])
  else:
      customer.append(stripped_line)
      orders_counter +=1

      if orders_counter == num_of_orders:
          tot_customers.append(customer)
          customer = []
          orders_counter = 0

如何分隔从文本文件读取的数据行？顾客带着他们的订单

How to separate lines of data read from a textfile? Customers with their orders

validation

for-loop

dataframe

python-3.x

opentext