合并两个文件,每行按整数排序,不读入内存也不排序

merge two files sorted numerically by an integer in each line without reading into memory and without sorting

我有两个文件,file1 和 file2,按第 2 列数字排序:

文件 1

A 1
B 10

文件2

C 2
D 100

我想合并它们并得到这个输出,它也是按第 2 列按数字排序的:

D 1
B 2
C 10
A 100

我可以用这个 unix 命令来完成,它不排序,而是合并预先排序的文件:

sort -m -k2n,2 file1 file2

但是我如何在 Python 3.4 中执行此操作而不将文件读入内存并且不进行排序? Python 3.5会根据docs.python.org, bugs.python.org and github.com, but no pre-release is available. In the meantime I came up with the solution below. Is there a more elegant way? Could I have used map给heapq.merge()加一个key参数,循环遍历两个文件?或许我应该 post 改为进行代码审查?

import heapq

def key_generator(fd):
    for line in fd:
        yield int(line.split()[1]), line

with open('file1') as fd1, open('file2') as fd2:

    it1 = key_generator(fd1)
    it2 = key_generator(fd2)
    for key, line in heapq.merge(it1, it2):
        print(line, end='')

You can try this way

dict={}

with open("a.txt",'r') as f1, open("b.txt",'rb') as f2:
    lines_a=f1.readlines()
    lines_b=f2.readlines()
    for line in lines_a:
        dict.update({line.split()[0]:int(line.split()[1])})

    for line in lines_b:
        dict.update({line.split()[0]:int(line.split()[1])})

for w in sorted(dict, key=dict.get):
  print w,(dict[w])

我刚刚下载了 Python3.5 的 alpha 版本 1,我可以使用新的 key function of heapq.merge():

from heapq import merge

def keyfunc(s):
    return int(s.split()[1])

with open('file1') as fd1, open('file2') as fd2:
    for line in merge(fd1, fd2, key=keyfunc):
        print(line)

或者对于那些喜欢单行 lambda 函数的人:

key=lambda line: int(line.split()[1])

我可以使用 map、operator.itemgetter()、str.split 和 int 在一行中完成吗?