在双引号内忽略逗号时出现问题

Having issue ignoring comma when inside double quotes

我有一个 csv 文件,我正在向该文件应用两个公式并创建一个新文件以将数据放入其中。这些公式工作得很好,但原始文件中有一个日期列,格式如下:“dd mmm, yyyy”。当我 运行 程序时,新文件显示由于逗号,日期被分成两个单独的列。下面是我试图忽略逗号,因为它在双引号中:

#Converting HashString column to MD5 Hash & HexString column to ascii hex
with open('file1') as csvfile4:
    with open('file2', "r+") as output:

#This line below is where I believe the issue is...
        reader = csv.DictReader(csvfile4, quotechar='"', delimiter=",", quoting=csv.QUOTE_ALL, skipinitialspace=True)
        for i, r in enumerate(reader):
            # writing csv headers
            if i == 0:
                output.write(','.join(r) + '\n')

            # all data in HashString column replaced with hashed version of data
            r['HashString'] = hashlib.md5((r['HashString']).encode('utf-8')).hexdigest()
            # all data in HexString column replaced with ascii hex version of data
            r['HexString'] = r['HexString'].encode().hex()

            output.write(','.join(r.values()) + '\n')

这是包含列名的原始数据的第一行。

UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020

在输出文件中,它看起来像这样(注意 SurveySent 日期被拆分,年份部分被推送到 stayMonth,之后的所有内容被推送到下一列,这导致数据不匹配)。

UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, 3-Dec, 2020, 12, 2020, 111111, image.jpg, c79d87a3c8eecf12669b138430ce2b20, 556e6971756549443d53504931363532383539266c6f636174696f6e49643d35343739363126656d61696c3d626f62636c65617279407477632e636f6d2666697273744e616d653d524f42455254266c6173744e616d653d434c4541525926636974793d4c4558494e47544f4e26737461794d6f6e74683d31322673746179596561723d32303230

不幸的是,更改原始文件中的日期格式是不可能的,它必须保留为“dd mmm, yyyy”。在这种情况下正确忽略逗号的任何帮助将不胜感激!

见下文。替换 csvfile4 = ...output = ...

from io import StringIO
import csv, hashlib

data = '''\
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020
'''

csvfile4 = StringIO(data) # open('file1')
output = StringIO() # open('file2', 'w', newline='')

reader = csv.DictReader(
    csvfile4,
    quotechar='"',
    delimiter=",",
    quoting=csv.QUOTE_ALL,
    skipinitialspace=True)
writer = csv.DictWriter(output, reader.fieldnames)
writer.writeheader()
for i, r in enumerate(reader):
    # all data in HashString column replaced with hashed version of data
    r['HashString'] = hashlib.md5(
        (r['HashString']).encode('utf-8')).hexdigest()
    # all data in HexString column replaced with ascii hex version of data
    r['HexString'] = r['HexString'].encode().hex()

    writer.writerow(r)

print(output.getvalue()) # See "03 Dec, 2020"