来自具有不同长度和键的字典列表的 CSV

CSV from list of dictionaries with differing length and keys

我有一个要写入 csv 文件的词典列表。 第一个词典的长度和关键字与以下词典不同。

dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]

如何将其写入 csv 文件,使文件看起来像这样:

A B C D E
1 2 3 4 5
    6 7 8
    . . .
如果您在列表中调用 pd.DataFrame()

Pandas 能够从字典列表生成数据框。在生成的数据框中,每个字典都是一行,每个键对应一列。因此,对应于第 7 个字典中的第 3 个键(我称之为 key3)的值将位于 key3 列的第 7 行。

这对您的问题意味着什么:您首先必须修改 dict_list 以包含合并的字典,如下所示:

dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
print(dict_list)

[{'A': 1, 'B': 2},
 {'C': 3, 'D': 4, 'E': 5},
 {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
 {'C': 6, 'D': 7, 'E': 8}]

这会将索引 2 处的前两个词典的组合插入到您的列表中。为什么索引2?这使您可以在将列表转换为数据帧时方便地对其进行切片,从而为您提供所需的输出

df = pd.DataFrame(dict_list[2:])
print(df)

     A    B  C  D  E
0  1.0  2.0  3  4  5
1  NaN  NaN  6  7  8

为了比较,在未修改列表上直接调用pd.DataFrame给你

df_unmodified = pd.DataFrame(dict_list)
print(df_unmodified)

     A    B    C    D    E
0  1.0  2.0  NaN  NaN  NaN
1  NaN  NaN  3.0  4.0  5.0
2  NaN  NaN  6.0  7.0  8.0

之后,您可以使用df.to_csv()将数据帧保存到csv文件

问题是您需要完整的列集才能在文件开头写入 header。但除此之外,csv.DictWriter 是您所需要的:

# optional: compute the fieldnames:
fieldnames = set()
for d in dict_list:
    fieldnames.update(d.keys())
fieldnames = sorted(fieldnames)    # sort the fieldnames...

# produce the csv file
with open("file.csv", "w", newline='') as fd:
    wr = csv.DictWriter(fd, fieldnames)
    wr.writeheader()
    wr.writerows(dict_list)

生成的 csv 将如下所示:

A,B,C,D,E
1,2,,,
,,3,4,5
,,6,7,8

如果你真的想用不相交的键集组合行,你可以这样做:

# produce the csv file
with open("file.csv", "w", newline='') as fd:
    wr = csv.DictWriter(fd, sorted(fieldnames))
    old = { k: k for k in wr.fieldnames }     # use old for the header line
    for row in dict_list:
        if len(set(old.keys()).intersection(row.keys())) != 0:
            wr.writerow(old)                  # common fields: write old and start a new row
            old = row
        old.update(row)                       # disjoint fields: just combine
    wr.writerow(old)                          # do not forget last row

您将获得:

A,B,C,D,E
1,2,3,4,5
,,6,7,8

您也可以仅使用 python 语言附带的内置功能。我下面的示例类似于@Serge Ballesta 提出的示例。代码如下:

import csv

# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
    names = set(item.keys())
    fields = fields | names   # we used the **or** i.e | operator for **set**

fields = list(fields)   # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()

# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.

def clean_data(origdata, fieldnames):
    """Turn the original data into a new data with similar field in data items.

    Parameters
    ----------
    origdata: list of dict
         original data which will be cleaned or harmonized according to the field names
    fieldnames: list of strings
         fields names in the new data items

    Returns
    -------
    Returns a new data consisting of list of dict where all dict items have the same
    keys (i.e fieldnames)
    """
    newdata = []
    for dataitem in data:
        keys = dataitem.keys()
        for key in fieldnames:
             if key not in keys:
                  # In this instance we update the datitem with **key** and value= ' '
                  dataitem[key] = ' '
        newdata.append(dataitem)

    return newdata


def main():
    """Test the above function and display the result"""
    newdata = clean_data(data, fields)

    # write the data to a csv file
    with open("data.csv", "w", newline='') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fields)
        writer.writeheader()
        for row in newdata:
            writer.writerow(row)

    # Now let load our newly written csv file and print the content
    # -- some fancy display formatting here: not needed but I like it. :)
    nfields = len(fields)
    fmt = " %s " * nfields
    headInfo = fmt % tuple(fields)
    line = '-'* (len(headInfo)+1)
    print(line)
    print("|" + headInfo)
    print(line)
    with open("data.csv", "r", newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for item im reader:
            row = [item[field] for field in fields]
            printf("|" + fmt % tuple(row))

    print(line)



main()

上面的脚本将产生以下输出:

---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 |   |   |   |
|   |   | 3 | 4 | 5 |
|   |   | 6 | 7 | 8 |
---------------------