来自具有不同长度和键的字典列表的 CSV
CSV from list of dictionaries with differing length and keys
我有一个要写入 csv 文件的词典列表。
第一个词典的长度和关键字与以下词典不同。
dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]
如何将其写入 csv 文件,使文件看起来像这样:
A B C D E
1 2 3 4 5
6 7 8
. . .
如果您在列表中调用 pd.DataFrame()
,Pandas 能够从字典列表生成数据框。在生成的数据框中,每个字典都是一行,每个键对应一列。因此,对应于第 7 个字典中的第 3 个键(我称之为 key3
)的值将位于 key3
列的第 7 行。
这对您的问题意味着什么:您首先必须修改 dict_list
以包含合并的字典,如下所示:
dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
print(dict_list)
[{'A': 1, 'B': 2},
{'C': 3, 'D': 4, 'E': 5},
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
{'C': 6, 'D': 7, 'E': 8}]
这会将索引 2 处的前两个词典的组合插入到您的列表中。为什么索引2?这使您可以在将列表转换为数据帧时方便地对其进行切片,从而为您提供所需的输出
df = pd.DataFrame(dict_list[2:])
print(df)
A B C D E
0 1.0 2.0 3 4 5
1 NaN NaN 6 7 8
为了比较,在未修改列表上直接调用pd.DataFrame
给你
df_unmodified = pd.DataFrame(dict_list)
print(df_unmodified)
A B C D E
0 1.0 2.0 NaN NaN NaN
1 NaN NaN 3.0 4.0 5.0
2 NaN NaN 6.0 7.0 8.0
之后,您可以使用df.to_csv()
将数据帧保存到csv文件
问题是您需要完整的列集才能在文件开头写入 header。但除此之外,csv.DictWriter
是您所需要的:
# optional: compute the fieldnames:
fieldnames = set()
for d in dict_list:
fieldnames.update(d.keys())
fieldnames = sorted(fieldnames) # sort the fieldnames...
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, fieldnames)
wr.writeheader()
wr.writerows(dict_list)
生成的 csv 将如下所示:
A,B,C,D,E
1,2,,,
,,3,4,5
,,6,7,8
如果你真的想用不相交的键集组合行,你可以这样做:
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, sorted(fieldnames))
old = { k: k for k in wr.fieldnames } # use old for the header line
for row in dict_list:
if len(set(old.keys()).intersection(row.keys())) != 0:
wr.writerow(old) # common fields: write old and start a new row
old = row
old.update(row) # disjoint fields: just combine
wr.writerow(old) # do not forget last row
您将获得:
A,B,C,D,E
1,2,3,4,5
,,6,7,8
您也可以仅使用 python 语言附带的内置功能。我下面的示例类似于@Serge Ballesta 提出的示例。代码如下:
import csv
# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
names = set(item.keys())
fields = fields | names # we used the **or** i.e | operator for **set**
fields = list(fields) # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()
# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.
def clean_data(origdata, fieldnames):
"""Turn the original data into a new data with similar field in data items.
Parameters
----------
origdata: list of dict
original data which will be cleaned or harmonized according to the field names
fieldnames: list of strings
fields names in the new data items
Returns
-------
Returns a new data consisting of list of dict where all dict items have the same
keys (i.e fieldnames)
"""
newdata = []
for dataitem in data:
keys = dataitem.keys()
for key in fieldnames:
if key not in keys:
# In this instance we update the datitem with **key** and value= ' '
dataitem[key] = ' '
newdata.append(dataitem)
return newdata
def main():
"""Test the above function and display the result"""
newdata = clean_data(data, fields)
# write the data to a csv file
with open("data.csv", "w", newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fields)
writer.writeheader()
for row in newdata:
writer.writerow(row)
# Now let load our newly written csv file and print the content
# -- some fancy display formatting here: not needed but I like it. :)
nfields = len(fields)
fmt = " %s " * nfields
headInfo = fmt % tuple(fields)
line = '-'* (len(headInfo)+1)
print(line)
print("|" + headInfo)
print(line)
with open("data.csv", "r", newline='') as csvfile:
reader = csv.DictReader(csvfile)
for item im reader:
row = [item[field] for field in fields]
printf("|" + fmt % tuple(row))
print(line)
main()
上面的脚本将产生以下输出:
---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 | | | |
| | | 3 | 4 | 5 |
| | | 6 | 7 | 8 |
---------------------
我有一个要写入 csv 文件的词典列表。 第一个词典的长度和关键字与以下词典不同。
dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]
如何将其写入 csv 文件,使文件看起来像这样:
A B C D E
1 2 3 4 5
6 7 8
. . .
pd.DataFrame()
,Pandas 能够从字典列表生成数据框。在生成的数据框中,每个字典都是一行,每个键对应一列。因此,对应于第 7 个字典中的第 3 个键(我称之为 key3
)的值将位于 key3
列的第 7 行。
这对您的问题意味着什么:您首先必须修改 dict_list
以包含合并的字典,如下所示:
dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
print(dict_list)
[{'A': 1, 'B': 2},
{'C': 3, 'D': 4, 'E': 5},
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
{'C': 6, 'D': 7, 'E': 8}]
这会将索引 2 处的前两个词典的组合插入到您的列表中。为什么索引2?这使您可以在将列表转换为数据帧时方便地对其进行切片,从而为您提供所需的输出
df = pd.DataFrame(dict_list[2:])
print(df)
A B C D E
0 1.0 2.0 3 4 5
1 NaN NaN 6 7 8
为了比较,在未修改列表上直接调用pd.DataFrame
给你
df_unmodified = pd.DataFrame(dict_list)
print(df_unmodified)
A B C D E
0 1.0 2.0 NaN NaN NaN
1 NaN NaN 3.0 4.0 5.0
2 NaN NaN 6.0 7.0 8.0
之后,您可以使用df.to_csv()
将数据帧保存到csv文件
问题是您需要完整的列集才能在文件开头写入 header。但除此之外,csv.DictWriter
是您所需要的:
# optional: compute the fieldnames:
fieldnames = set()
for d in dict_list:
fieldnames.update(d.keys())
fieldnames = sorted(fieldnames) # sort the fieldnames...
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, fieldnames)
wr.writeheader()
wr.writerows(dict_list)
生成的 csv 将如下所示:
A,B,C,D,E
1,2,,,
,,3,4,5
,,6,7,8
如果你真的想用不相交的键集组合行,你可以这样做:
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, sorted(fieldnames))
old = { k: k for k in wr.fieldnames } # use old for the header line
for row in dict_list:
if len(set(old.keys()).intersection(row.keys())) != 0:
wr.writerow(old) # common fields: write old and start a new row
old = row
old.update(row) # disjoint fields: just combine
wr.writerow(old) # do not forget last row
您将获得:
A,B,C,D,E
1,2,3,4,5
,,6,7,8
您也可以仅使用 python 语言附带的内置功能。我下面的示例类似于@Serge Ballesta 提出的示例。代码如下:
import csv
# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
names = set(item.keys())
fields = fields | names # we used the **or** i.e | operator for **set**
fields = list(fields) # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()
# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.
def clean_data(origdata, fieldnames):
"""Turn the original data into a new data with similar field in data items.
Parameters
----------
origdata: list of dict
original data which will be cleaned or harmonized according to the field names
fieldnames: list of strings
fields names in the new data items
Returns
-------
Returns a new data consisting of list of dict where all dict items have the same
keys (i.e fieldnames)
"""
newdata = []
for dataitem in data:
keys = dataitem.keys()
for key in fieldnames:
if key not in keys:
# In this instance we update the datitem with **key** and value= ' '
dataitem[key] = ' '
newdata.append(dataitem)
return newdata
def main():
"""Test the above function and display the result"""
newdata = clean_data(data, fields)
# write the data to a csv file
with open("data.csv", "w", newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fields)
writer.writeheader()
for row in newdata:
writer.writerow(row)
# Now let load our newly written csv file and print the content
# -- some fancy display formatting here: not needed but I like it. :)
nfields = len(fields)
fmt = " %s " * nfields
headInfo = fmt % tuple(fields)
line = '-'* (len(headInfo)+1)
print(line)
print("|" + headInfo)
print(line)
with open("data.csv", "r", newline='') as csvfile:
reader = csv.DictReader(csvfile)
for item im reader:
row = [item[field] for field in fields]
printf("|" + fmt % tuple(row))
print(line)
main()
上面的脚本将产生以下输出:
---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 | | | |
| | | 3 | 4 | 5 |
| | | 6 | 7 | 8 |
---------------------