概念：将字典结果的二维矩阵写入 Python 中的 CSV 文件

Question

我有这样格式的字典：键是文档编号和关键字的元组，值是文档中关键字的频率。因此，键将是 (document1, keyword1), (document1, keyword2), (document1, keyword3), (document2, keyword1), (document2, keyword2), (document2, keyword3), (document3, keyword1), (document3 , keyword2), 和 (document3, keyword3) 并且值将是数字。当然这是一本小词典。我希望将该解决方案应用于大量文档和关键字。

字典是这样创建的：

document_count = {}
try:
    for doc in document_id_list:
        indiv_doc = # records selected from a database
        for w in words:
            document_count.setdefault((doc, w), 0)
            for entry in #unsorted list of text tokenized, set to lower case, and stripped of stop words:
                if entry == w and (doc, entry) in document_count:
                        document_count[(patent, entry)] += 1
    return document_count

except Exception, e:
    print "create claim storages"
    print str(e)
    pass

我想像二维矩阵一样将结果写入 CSV。至少，这就是我所看到的描述方式。

      keyword1 keyword2 keyword3
document1 number   number   number
document2 number   number   number 
document3 number   number   number

在查看 python.org 上的 CSV 函数文档和本网站上的其他问题时，我得到的最接近的是：

document1 keyword1 number
document1 keyword2 number
document1 keyword3 number
document2 keyword1 number
document2 keyword2 number
document2 keyword3 number
document3 keyword1 number
document3 keyword2 number
document3 keyword3 number

这是我编写的代码的结果：

 with open(os.path.join('C:/Users/Tara/PyCharmProjects/untitled/csv_results/', file_name),
                    'wb') as csvfile:
   w = csv.writer(csvfile)
   for key, value in available_dict.items():
       separate_keys = list(key)
       w.writerow([separate_keys[0], separate_keys[1], value])

我注意到很多解决方案都涉及列表理解，但我不知道正确的 if 语句是什么。我会在编写字典或写入 CSV 文件时进行更改吗？

Answer 1

许多现有的 python 库处理编写 csv 文件的任务，因此我假设您只想使用简单的 python 语句和结构。

下面的主要策略是编写一个生成器函数来创建 csv 文件的行。为此，该函数首先从字典中提取文档和关键字并对其进行排序，然后生成包含关键字的 header 行，然后创建并生成每个文档的行

我使用了最少数量的列表理解，如果您准备多写几行，这很容易避免

D = {
    ('doc1', 'key1'): 2, ('doc1', 'key2'): 3, ('doc1', 'key3'): 4,
    ('doc2', 'key1'): 4, ('doc2', 'key2'): 6, ('doc2', 'key3'): 8,
    ('doc3', 'key1'): 6, ('doc3', 'key2'): 9, ('doc3', 'key3'): 12,
}

def gen_rows(D):
    sorted_docs = sorted(set(t[0] for t in D))
    sorted_kwds = sorted(set(t[1] for t in D))
    yield [None,] + sorted_kwds
    for d in sorted_docs:
        yield [d,] + [D.get((d, k), 0) for k in sorted_kwds]

for row in gen_rows(D):
    print(row)

这是输出，准备写入 csv 文件的行列表

[None, 'key1', 'key2', 'key3']
['doc1', 2, 3, 4]
['doc2', 4, 6, 8]
['doc3', 6, 9, 12]

概念：将字典结果的二维矩阵写入 Python 中的 CSV 文件

Conceptual: writing a 2D matrix of dictionary results to a CSV file in Python

python

csv

dictionary

export-to-csv