如何分隔从文本文件读取的数据行?顾客带着他们的订单
How to separate lines of data read from a textfile? Customers with their orders
我将这些数据保存在一个文本文件中。 (没有我为清楚起见添加的间距)
我正在使用 Python3:
orders = open('orders.txt', 'r')
lines = orders.readlines()
我需要遍历包含所有数据行的 lines 变量,并按照我的间隔分隔 CO 行。
CO 是客户,每个 CO 下面的行是客户下的订单。
如果您查看 CO 字符串的索引 [7-9],CO 行会告诉我们存在多少行订单。
我在下面对此进行了说明。
CO77812002D10212020 <---(002)
125^LO917^11212020. <----line 1
235^IL993^11252020 <----line 2
CO77812002S10212020
125^LO917^11212020
235^IL993^11252020
CO95307005D06092019 <---(005)
194^AF977^06292019 <---line 1
72^L223^07142019 <---line 2
370^IL993^08022019 <---line 3
258^Y337^07072019 <---line 4
253^O261^06182019 <---line 5
CO30950003D06012019
139^LM485^06272019
113^N669^06192019
249^P530^07112019
CO37501001D05252020
479^IL993^06162020
我想到了一种蛮力的方法,但它不适用于更大的数据集。
如有任何帮助,我们将不胜感激!
您可以使用 fileinput
(source) to "simultaneously" read and modify your file. In fact, the in-place functionality that offers to modify a file while parsing it is implemented through a second backup file. Specifically, as stated here:
Optional in-place filtering: if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (...) by default, the extension is '.bak' and it is deleted when the output file is closed.
因此,您可以按照指定的方式格式化您的文件:
import fileinput
with fileinput.input(files = ['orders.txt'], inplace=True) as orders_file:
for line in orders_file:
if line[:2] == 'CO': # Detect customer line
orders_counter = 0
num_of_orders = int(line[7:10]) # Extract number of orders
else:
orders_counter += 1
# If last order for specific customer has been reached
# append a '\n' character to format it as desired
if orders_counter == num_of_orders:
line += '\n'
# Since standard output is redirected to the file, print writes in the file
print(line, end='')
注意: 假设包含订单的文件格式为 exactly您指定的方式:
CO...
(order_1)
(order_2)
...
(order_i)
CO...
(order_1)
...
这达到了我希望完成的目的!
tot_customers = []
with open("orders.txt", "r") as a_file:
customer = []
for line in a_file:
stripped_line = line.strip()
if stripped_line[:2] == "CO":
customer.append(stripped_line)
print("customers: ", customer)
orders_counter = 0
num_of_orders = int(stripped_line[7:10])
else:
customer.append(stripped_line)
orders_counter +=1
if orders_counter == num_of_orders:
tot_customers.append(customer)
customer = []
orders_counter = 0
我将这些数据保存在一个文本文件中。 (没有我为清楚起见添加的间距)
我正在使用 Python3:
orders = open('orders.txt', 'r')
lines = orders.readlines()
我需要遍历包含所有数据行的 lines 变量,并按照我的间隔分隔 CO 行。 CO 是客户,每个 CO 下面的行是客户下的订单。
如果您查看 CO 字符串的索引 [7-9],CO 行会告诉我们存在多少行订单。 我在下面对此进行了说明。
CO77812002D10212020 <---(002)
125^LO917^11212020. <----line 1
235^IL993^11252020 <----line 2
CO77812002S10212020
125^LO917^11212020
235^IL993^11252020
CO95307005D06092019 <---(005)
194^AF977^06292019 <---line 1
72^L223^07142019 <---line 2
370^IL993^08022019 <---line 3
258^Y337^07072019 <---line 4
253^O261^06182019 <---line 5
CO30950003D06012019
139^LM485^06272019
113^N669^06192019
249^P530^07112019
CO37501001D05252020
479^IL993^06162020
我想到了一种蛮力的方法,但它不适用于更大的数据集。
如有任何帮助,我们将不胜感激!
您可以使用 fileinput
(source) to "simultaneously" read and modify your file. In fact, the in-place functionality that offers to modify a file while parsing it is implemented through a second backup file. Specifically, as stated here:
Optional in-place filtering: if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (...) by default, the extension is '.bak' and it is deleted when the output file is closed.
因此,您可以按照指定的方式格式化您的文件:
import fileinput
with fileinput.input(files = ['orders.txt'], inplace=True) as orders_file:
for line in orders_file:
if line[:2] == 'CO': # Detect customer line
orders_counter = 0
num_of_orders = int(line[7:10]) # Extract number of orders
else:
orders_counter += 1
# If last order for specific customer has been reached
# append a '\n' character to format it as desired
if orders_counter == num_of_orders:
line += '\n'
# Since standard output is redirected to the file, print writes in the file
print(line, end='')
注意: 假设包含订单的文件格式为 exactly您指定的方式:
CO...
(order_1)
(order_2)
...
(order_i)
CO...
(order_1)
...
这达到了我希望完成的目的!
tot_customers = []
with open("orders.txt", "r") as a_file:
customer = []
for line in a_file:
stripped_line = line.strip()
if stripped_line[:2] == "CO":
customer.append(stripped_line)
print("customers: ", customer)
orders_counter = 0
num_of_orders = int(stripped_line[7:10])
else:
customer.append(stripped_line)
orders_counter +=1
if orders_counter == num_of_orders:
tot_customers.append(customer)
customer = []
orders_counter = 0