如何在 python 上将文件的两行减去两个组?
How can I subtract groups of two by two rows of a file on python?
我想减去一个文件的连续两行。例如:
我有一个包含 4,000,000 行的文件,数据如下:
2345 345.67
2344 245.34
45678 331.45
45679 339.32
7654 109.42
7655 250.78
所以我想减去两个连续的行(第 2 列)并在结果大于或等于 60 时打印绝对结果。减去将两行两行,并打印到第一个值第 1 列。我的意思是,我希望得到这样的结果:
2345 100.13
7654 141.36
我尝试在 bash 中完成,但速度太慢,我想在 python 中完成,但我不知道该怎么做,我是新手python。如何直接读取我的文件以及如何使用 python 模块?我读过 dataframe 和 abs 可以帮助我,但是,如何?你能指导我吗?
非常感谢。
x=1
同时 [ $x -ge 2 ]
做
a=sed -n '1,2p' file.dat| awk 'NR>1{print -p} {p=}'
echo $a >> results.dat
grep -v "$a" file.dat > file.o
mv file.o file.dat
完成
~
~
您实际上可以将结果直接写入 Python 中的文件。比如像这样:
# import regular expression module of python
import re
# open file (replace data.txt with input file name and out.txt with the output file name)
with open('data.txt', 'r') as f, open('out.txt', 'w') as o:
# read the first line (i=0) manually
currentLine = re.findall('\d+\.?\d*', f.readline())
# index i starts with 0 and refers to the currentLine, s.t.
# prevLine
# currentLine [i=0]
# prevLine [i=0]
# currentLine [i=1]
# therefore we only look at every second iteration
for i,line in enumerate(f.readlines()):
# set the previous line to the current line
prevLine = currentLine
# extract numbers
currentLine = re.findall('\d+\.?\d*', line)
if i%2==0: # look only at every second iteration (row 1 - row 2; row 3 - row 4; etc.)
# calculate the absolute difference between rows i and i+1, i.e. abs((i,0)-(i+1,1))
sub = abs(float(prevLine[1])-float(currentLine[1]))
# if this absolute difference is >= 60, print the result
if sub>=60:
outputLine = "%s %s"%(str(prevLine[0]), str(sub))
print(outputLine)
o.write(outputLine+"\n") # write the line to the file 'out.txt'
因此,您的数据输出为:
2345 100.33000000000001
7654 141.36
我想减去一个文件的连续两行。例如:
我有一个包含 4,000,000 行的文件,数据如下:
2345 345.67
2344 245.34
45678 331.45
45679 339.32
7654 109.42
7655 250.78
所以我想减去两个连续的行(第 2 列)并在结果大于或等于 60 时打印绝对结果。减去将两行两行,并打印到第一个值第 1 列。我的意思是,我希望得到这样的结果:
2345 100.13
7654 141.36
我尝试在 bash 中完成,但速度太慢,我想在 python 中完成,但我不知道该怎么做,我是新手python。如何直接读取我的文件以及如何使用 python 模块?我读过 dataframe 和 abs 可以帮助我,但是,如何?你能指导我吗?
非常感谢。
x=1
同时 [ $x -ge 2 ]
做
a=sed -n '1,2p' file.dat| awk 'NR>1{print -p} {p=}'
echo $a >> results.dat
grep -v "$a" file.dat > file.o
mv file.o file.dat
完成
~
~
您实际上可以将结果直接写入 Python 中的文件。比如像这样:
# import regular expression module of python
import re
# open file (replace data.txt with input file name and out.txt with the output file name)
with open('data.txt', 'r') as f, open('out.txt', 'w') as o:
# read the first line (i=0) manually
currentLine = re.findall('\d+\.?\d*', f.readline())
# index i starts with 0 and refers to the currentLine, s.t.
# prevLine
# currentLine [i=0]
# prevLine [i=0]
# currentLine [i=1]
# therefore we only look at every second iteration
for i,line in enumerate(f.readlines()):
# set the previous line to the current line
prevLine = currentLine
# extract numbers
currentLine = re.findall('\d+\.?\d*', line)
if i%2==0: # look only at every second iteration (row 1 - row 2; row 3 - row 4; etc.)
# calculate the absolute difference between rows i and i+1, i.e. abs((i,0)-(i+1,1))
sub = abs(float(prevLine[1])-float(currentLine[1]))
# if this absolute difference is >= 60, print the result
if sub>=60:
outputLine = "%s %s"%(str(prevLine[0]), str(sub))
print(outputLine)
o.write(outputLine+"\n") # write the line to the file 'out.txt'
因此,您的数据输出为:
2345 100.33000000000001
7654 141.36