合并两个文件,每行按整数排序,不读入内存也不排序
merge two files sorted numerically by an integer in each line without reading into memory and without sorting
我有两个文件,file1 和 file2,按第 2 列数字排序:
文件 1
A 1
B 10
文件2
C 2
D 100
我想合并它们并得到这个输出,它也是按第 2 列按数字排序的:
D 1
B 2
C 10
A 100
我可以用这个 unix 命令来完成,它不排序,而是合并预先排序的文件:
sort -m -k2n,2 file1 file2
但是我如何在 Python 3.4 中执行此操作而不将文件读入内存并且不进行排序? Python 3.5会根据docs.python.org, bugs.python.org and github.com, but no pre-release is available. In the meantime I came up with the solution below. Is there a more elegant way? Could I have used map给heapq.merge()加一个key参数,循环遍历两个文件?或许我应该 post 改为进行代码审查?
import heapq
def key_generator(fd):
for line in fd:
yield int(line.split()[1]), line
with open('file1') as fd1, open('file2') as fd2:
it1 = key_generator(fd1)
it2 = key_generator(fd2)
for key, line in heapq.merge(it1, it2):
print(line, end='')
You can try this way
dict={}
with open("a.txt",'r') as f1, open("b.txt",'rb') as f2:
lines_a=f1.readlines()
lines_b=f2.readlines()
for line in lines_a:
dict.update({line.split()[0]:int(line.split()[1])})
for line in lines_b:
dict.update({line.split()[0]:int(line.split()[1])})
for w in sorted(dict, key=dict.get):
print w,(dict[w])
我刚刚下载了 Python3.5 的 alpha 版本 1,我可以使用新的 key function of heapq.merge():
from heapq import merge
def keyfunc(s):
return int(s.split()[1])
with open('file1') as fd1, open('file2') as fd2:
for line in merge(fd1, fd2, key=keyfunc):
print(line)
或者对于那些喜欢单行 lambda 函数的人:
key=lambda line: int(line.split()[1])
我可以使用 map、operator.itemgetter()、str.split 和 int 在一行中完成吗?
我有两个文件,file1 和 file2,按第 2 列数字排序:
文件 1
A 1
B 10
文件2
C 2
D 100
我想合并它们并得到这个输出,它也是按第 2 列按数字排序的:
D 1
B 2
C 10
A 100
我可以用这个 unix 命令来完成,它不排序,而是合并预先排序的文件:
sort -m -k2n,2 file1 file2
但是我如何在 Python 3.4 中执行此操作而不将文件读入内存并且不进行排序? Python 3.5会根据docs.python.org, bugs.python.org and github.com, but no pre-release is available. In the meantime I came up with the solution below. Is there a more elegant way? Could I have used map给heapq.merge()加一个key参数,循环遍历两个文件?或许我应该 post 改为进行代码审查?
import heapq
def key_generator(fd):
for line in fd:
yield int(line.split()[1]), line
with open('file1') as fd1, open('file2') as fd2:
it1 = key_generator(fd1)
it2 = key_generator(fd2)
for key, line in heapq.merge(it1, it2):
print(line, end='')
You can try this way
dict={}
with open("a.txt",'r') as f1, open("b.txt",'rb') as f2:
lines_a=f1.readlines()
lines_b=f2.readlines()
for line in lines_a:
dict.update({line.split()[0]:int(line.split()[1])})
for line in lines_b:
dict.update({line.split()[0]:int(line.split()[1])})
for w in sorted(dict, key=dict.get):
print w,(dict[w])
我刚刚下载了 Python3.5 的 alpha 版本 1,我可以使用新的 key function of heapq.merge():
from heapq import merge
def keyfunc(s):
return int(s.split()[1])
with open('file1') as fd1, open('file2') as fd2:
for line in merge(fd1, fd2, key=keyfunc):
print(line)
或者对于那些喜欢单行 lambda 函数的人:
key=lambda line: int(line.split()[1])
我可以使用 map、operator.itemgetter()、str.split 和 int 在一行中完成吗?