在 csv dictwriter 中将 dict 值从 unicode 转换为 utf-8(或 ascii)

Converting dict values from unicode to utf-8 (or ascii) within csv dictwriter

我正在尝试将一些数据打印到 csv 文件,但 unicode 正在破坏我的氛围。

我的数据是字典格式 - 这里有一个片段:

 {'category': u'Best food blog written by a linguist\xa0', 'runners_up': [], 'winner': [u'shesimmers.com'], 'category_url': 'http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663'}

这是我使用 DictWriter 方法的代码段。

    data = utf_8_encoder(data)
    with open('best_food_n_drink.csv', 'w') as csvfile:
        categories = ['category', 'category_url', 'winner', 'runners_up']
        writer = csv.DictWriter(csvfile, delimiter =',', fieldnames=categories)
        writer.writeheader()
        for row in data:
            writer.writerow(row)

utf_8_encoder 来自我之前定义的函数:

  def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        line.encode('utf-8')
    return unicode_csv_data

我不断收到类似 'dict' object has no attribute 'encode' 的错误消息。我尝试过放弃编码器功能并在底部的 for 循环中替换 row.values().encode('utf-8'),但这只是告诉我“列表对象没有属性 'encode'”。

我也试过用 ('ascii', 'ignore') 代替 ('utf-8') 但就是想不通。

不确定您希望输出的格式是什么,但这将对您的字符串进行编码:

def map_to(d):
    # iterate over the key/values pairings
    for k, v in d.items():
        # if v is a list join and encode else just encode as it is a string
        d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")



map_to(data)

with open('best_food_n_drink.csv', 'w') as csvfile:
    categories = ['category', 'category_url', 'winner', 'runners_up']
    writer = csv.DictWriter(csvfile, fieldnames=categories)
    writer.writeheader()
    writer.writerow(data)

这将输出如下内容,但由于混合了字符串和列表,我真的不知道它最终应该是什么样子:

category,category_url,winner,runners_up
Best food blog written by a linguist ,http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663,shesimmers.com,

现在我们发现你实际上有一个列表如果我们需要迭代列表但逻辑仍然是相同的,我们只是运行循环中每个字典的函数:

data = [{'category': u"Best restaurant that's been around forever and is still worth the trip\xa0", 'runners_up': [u'Frontera Grill', u'Chicago Diner ', u'Sabatino\u2019s', u'Twin Anchors'], 'winner': [u'Lula Cafe'], 'category_url': 'http://www.chicagoreader.com/chicago/BestOf?category=1979894&year=2011'},
{'category': u'Best bang for your buck\xa0', 'runners_up': [u'Frasca Pizzeria & Wine Bar', u'Chutney Joe\u2019s', u'"My boyfriend!"'], 'winner': [u'Big Star', u'Sultan\u2019s Market']}]

def map_to(d):
    for k, v in d.items():
        d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")

with open('best_food_n_drink.csv', 'w') as csvfile:
    categories = ['category', 'category_url', 'winner', 'runners_up']
    writer = csv.DictWriter(csvfile, fieldnames=categories)
    writer.writeheader()
    # get each dict from the list
    for d in data:
        # run the encode func
        map_to(d)
        writer.writerow(d)

我假设 'category_url' 实际上存在于第二个字典中。

要捕获 None 并避免编码错误,请在函数中添加一行:

def map_to(d):
    for k, v in d.items():
        # catch None's
        if v is not None:
            d[k] = " ".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")

取决于您打算如何处理数据,将数据存储为 json 可能会有用:

import  json
with open('best_food_n_drink.js', 'w') as js:
    json.dump(data,js)

然后获取列表如果数据:

import  json
with open('best_food_n_drink.json') as js:
    data = json.load(js)

使用 python 3.4 使用:

io.open(filename, 'w', encoding='utf8') 

而不是

open(filename, 'w') 

为我解决了同样的问题。

另一个解决方案是创建全面的方法来检查 unicodelist 之外的其他类型,我知道在最初的问题中不是,但任何人都可以到这里尝试转换一个复杂的 dict(带有内部字典、列表...),所以这是我的贡献:

def array_to_utf(a):
    autf = []
    i = 0
    for v in a:
        if isinstance(v, unicode):
            autf.append(v.encode('utf-8'))
        elif isinstance(v, dict):
            autf.append(dict_to_utf(v))
        elif isinstance(v, list):
            autf.append(array_to_utf(v))
        else:
            autf.append(v)
    return autf

def dict_to_utf(d):
    dutf = {}
    for k,v in d.iteritems():
        if isinstance(v, unicode):
            dutf[k] = v.encode('utf-8')
        elif isinstance(v, list):
            dutf[k] = array_to_utf(v)
        elif isinstance(v, dict):
            dutf[k] = dict_to_utf(v)
        else:
            dutf[k] = v
    return dutf

test = {1: u'1', 2: '2', 3: {'x': u'x', 'y': 'y'}, 4: [u'ara', 's', 123], 5: 123}

print(dict_to_utf(a))
# {1: '1', 2: '2', 3: {'y': 'y', 'x': 'x'}, 4: ['ara', 's', 123], 5: 123}

这两种方法本身和彼此之间都是递归的。