在双引号内忽略逗号时出现问题
Having issue ignoring comma when inside double quotes
我有一个 csv 文件,我正在向该文件应用两个公式并创建一个新文件以将数据放入其中。这些公式工作得很好,但原始文件中有一个日期列,格式如下:“dd mmm, yyyy”。当我 运行 程序时,新文件显示由于逗号,日期被分成两个单独的列。下面是我试图忽略逗号,因为它在双引号中:
#Converting HashString column to MD5 Hash & HexString column to ascii hex
with open('file1') as csvfile4:
with open('file2', "r+") as output:
#This line below is where I believe the issue is...
reader = csv.DictReader(csvfile4, quotechar='"', delimiter=",", quoting=csv.QUOTE_ALL, skipinitialspace=True)
for i, r in enumerate(reader):
# writing csv headers
if i == 0:
output.write(','.join(r) + '\n')
# all data in HashString column replaced with hashed version of data
r['HashString'] = hashlib.md5((r['HashString']).encode('utf-8')).hexdigest()
# all data in HexString column replaced with ascii hex version of data
r['HexString'] = r['HexString'].encode().hex()
output.write(','.join(r.values()) + '\n')
这是包含列名的原始数据的第一行。
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020
在输出文件中,它看起来像这样(注意 SurveySent 日期被拆分,年份部分被推送到 stayMonth,之后的所有内容被推送到下一列,这导致数据不匹配)。
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, 3-Dec, 2020, 12, 2020, 111111, image.jpg, c79d87a3c8eecf12669b138430ce2b20, 556e6971756549443d53504931363532383539266c6f636174696f6e49643d35343739363126656d61696c3d626f62636c65617279407477632e636f6d2666697273744e616d653d524f42455254266c6173744e616d653d434c4541525926636974793d4c4558494e47544f4e26737461794d6f6e74683d31322673746179596561723d32303230
不幸的是,更改原始文件中的日期格式是不可能的,它必须保留为“dd mmm, yyyy”。在这种情况下正确忽略逗号的任何帮助将不胜感激!
见下文。替换 csvfile4 = ...
和 output = ...
from io import StringIO
import csv, hashlib
data = '''\
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020
'''
csvfile4 = StringIO(data) # open('file1')
output = StringIO() # open('file2', 'w', newline='')
reader = csv.DictReader(
csvfile4,
quotechar='"',
delimiter=",",
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
writer = csv.DictWriter(output, reader.fieldnames)
writer.writeheader()
for i, r in enumerate(reader):
# all data in HashString column replaced with hashed version of data
r['HashString'] = hashlib.md5(
(r['HashString']).encode('utf-8')).hexdigest()
# all data in HexString column replaced with ascii hex version of data
r['HexString'] = r['HexString'].encode().hex()
writer.writerow(r)
print(output.getvalue()) # See "03 Dec, 2020"
我有一个 csv 文件,我正在向该文件应用两个公式并创建一个新文件以将数据放入其中。这些公式工作得很好,但原始文件中有一个日期列,格式如下:“dd mmm, yyyy”。当我 运行 程序时,新文件显示由于逗号,日期被分成两个单独的列。下面是我试图忽略逗号,因为它在双引号中:
#Converting HashString column to MD5 Hash & HexString column to ascii hex
with open('file1') as csvfile4:
with open('file2', "r+") as output:
#This line below is where I believe the issue is...
reader = csv.DictReader(csvfile4, quotechar='"', delimiter=",", quoting=csv.QUOTE_ALL, skipinitialspace=True)
for i, r in enumerate(reader):
# writing csv headers
if i == 0:
output.write(','.join(r) + '\n')
# all data in HashString column replaced with hashed version of data
r['HashString'] = hashlib.md5((r['HashString']).encode('utf-8')).hexdigest()
# all data in HexString column replaced with ascii hex version of data
r['HexString'] = r['HexString'].encode().hex()
output.write(','.join(r.values()) + '\n')
这是包含列名的原始数据的第一行。
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020
在输出文件中,它看起来像这样(注意 SurveySent 日期被拆分,年份部分被推送到 stayMonth,之后的所有内容被推送到下一列,这导致数据不匹配)。
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, 3-Dec, 2020, 12, 2020, 111111, image.jpg, c79d87a3c8eecf12669b138430ce2b20, 556e6971756549443d53504931363532383539266c6f636174696f6e49643d35343739363126656d61696c3d626f62636c65617279407477632e636f6d2666697273744e616d653d524f42455254266c6173744e616d653d434c4541525926636974793d4c4558494e47544f4e26737461794d6f6e74683d31322673746179596561723d32303230
不幸的是,更改原始文件中的日期格式是不可能的,它必须保留为“dd mmm, yyyy”。在这种情况下正确忽略逗号的任何帮助将不胜感激!
见下文。替换 csvfile4 = ...
和 output = ...
from io import StringIO
import csv, hashlib
data = '''\
UniqueID, LastName,FirstName, Language, Email, Resort, SurveySent, stayMonth, stayYear, LocationID, Image, HashString, HexString
SPI12345, Smith, Joe, EN, example@test.com, Example Resort, "03 Dec, 2020",12, 2020, 111111, image.jpg, "G, E=s:0at9n_$@b(P7.E:lC?2)Rm6MOnUniqueID=SPI1652859&locationId=547961&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020",UniqueID=SPI12345&locationId=111111&email=example@test.com&firstName=JOE&lastName=SMITH&city=LEXINGTON&stayMonth=12&stayYear=2020
'''
csvfile4 = StringIO(data) # open('file1')
output = StringIO() # open('file2', 'w', newline='')
reader = csv.DictReader(
csvfile4,
quotechar='"',
delimiter=",",
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
writer = csv.DictWriter(output, reader.fieldnames)
writer.writeheader()
for i, r in enumerate(reader):
# all data in HashString column replaced with hashed version of data
r['HashString'] = hashlib.md5(
(r['HashString']).encode('utf-8')).hexdigest()
# all data in HexString column replaced with ascii hex version of data
r['HexString'] = r['HexString'].encode().hex()
writer.writerow(r)
print(output.getvalue()) # See "03 Dec, 2020"