合并两个文件，每行按整数排序，不读入内存也不排序

Question

我有两个文件，file1 和 file2，按第 2 列数字排序：

文件 1

A 1
B 10

文件2

C 2
D 100

我想合并它们并得到这个输出，它也是按第 2 列按数字排序的：

D 1
B 2
C 10
A 100

我可以用这个 unix 命令来完成，它不排序，而是合并预先排序的文件：

sort -m -k2n,2 file1 file2

但是我如何在 Python 3.4 中执行此操作而不将文件读入内存并且不进行排序？ Python 3.5会根据docs.python.org, bugs.python.org and github.com, but no pre-release is available. In the meantime I came up with the solution below. Is there a more elegant way? Could I have used map给heapq.merge()加一个key参数，循环遍历两个文件？或许我应该 post 改为进行代码审查？

import heapq

def key_generator(fd):
    for line in fd:
        yield int(line.split()[1]), line

with open('file1') as fd1, open('file2') as fd2:

    it1 = key_generator(fd1)
    it2 = key_generator(fd2)
    for key, line in heapq.merge(it1, it2):
        print(line, end='')

Answer 1

You can try this way

dict={}

with open("a.txt",'r') as f1, open("b.txt",'rb') as f2:
    lines_a=f1.readlines()
    lines_b=f2.readlines()
    for line in lines_a:
        dict.update({line.split()[0]:int(line.split()[1])})

    for line in lines_b:
        dict.update({line.split()[0]:int(line.split()[1])})

for w in sorted(dict, key=dict.get):
  print w,(dict[w])

Answer 2

我刚刚下载了 Python3.5 的 alpha 版本 1，我可以使用新的 key function of heapq.merge():

from heapq import merge

def keyfunc(s):
    return int(s.split()[1])

with open('file1') as fd1, open('file2') as fd2:
    for line in merge(fd1, fd2, key=keyfunc):
        print(line)

或者对于那些喜欢单行 lambda 函数的人：

key=lambda line: int(line.split()[1])

我可以使用 map、operator.itemgetter()、str.split 和 int 在一行中完成吗？

合并两个文件，每行按整数排序，不读入内存也不排序

merge two files sorted numerically by an integer in each line without reading into memory and without sorting

python

merge

python-3.4