合并来自 csv 的所有列的特定列数据,其中 studentnumber 是重复的
Merge specific column data from all columns of csv where studentnumber is duplicate
我写了一个 python 脚本,它从我们的 SIS 中获取大量学生数据,显示每个学生的完整 class 时间表。每个 class 都在自己的行中,因此每个学生都会有多行,因为他们有多个 class。该脚本写入一个新的 csv 文件,只有我需要的数据在脚本中定义为仅查找某些 class 名称。
这一切都按预期工作,但是......在最终的 csv 文件中,而不是像这样的多行:
jane doe, 123456, Language arts, Teacherone@ourdomain.org
jane doe, 123456, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, English 101, Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts, Teacherone@ourdomain.org
Johnny Appleseed, 321321, Math, Teacherone@ourdomai.org
我希望最终的 csv 文件如下所示:
Jane doe, 123456, Language Arts; Math, Teacherone@ourdomain.org;
Teachertwo@ourdomain.org
Suzie Que, 321256, Math; English 101, Teachertwo@ourdomain.org;
Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts; Math, Teacherone@ourdomain.org
我已经研究了 pandas,但不知道我将如何实现它。
如有任何帮助,我们将不胜感激。
代码如下:
import csv
def ixl():
with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL
CSV\IXL_DATA2.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL
CSV\NEW_studentexport.csv', mode='w', newline='') as output_file:
write = csv.writer(output_file, delimiter=',',
quoting=csv.QUOTE_MINIMAL)
for row in csv_reader:
Title = row[6]
coursename = row[9]
firstname = row[13]
lastname = row[16]
grade = row[14]
studentnumber = row[17]
studentidnumber = row[18]
teacheremail = row[19]
teacherfirst = row[20]
teacherlast = row[21]
stud_username = studentidnumber + "@highpointaca"
password = int(studentnumber) + int(studentidnumber)
if Title in ('Math 7', 'Albebra 1', 'Algebra 1 Honors',
'Algebra 2', 'Algebra 2 Honors', 'Dual Enrollment College Algebra (MAT
110',
'Dual Enrollment English Comp. (ENG 102)' , 'Reading 5' , 'Pre-Calculus
Honors' , 'Pre-Algebra8' , 'Pre-Algebra' , 'Mathematics' , 'Math K' ,
'Math
7' , 'Math 6 Honors' , 'Math 6' , 'Math 5' , 'Math 4' , 'Math 3' , 'Math
2' ,
'Math 1' , 'Language Arts 5', 'Language Arts 4', 'Language Arts 3',
'Language
Arts 2', 'Language Arts K', 'Language Arts 1', 'Language Arts', 'Geometry
Honors', 'Geometry', 'Essentials of Math I', 'English 4', 'English 3',
'English 2', 'English 1 Honors', 'English 1', 'ELA 7 Honors', 'ELA 6
Honors',
'ELA 8', 'ELA 7', 'ELA 6', 'Dual Enrollment English Comp. (ENG 101)'):
write.writerow([firstname, lastname, studentidnumber,
grade, teacheremail, stud_username, password, Title])
if __name__ == '__main__':
ixl()
使用 csv
模块和 collections.defaultdict
演示:
import csv
from collections import defaultdict
result = defaultdict(list)
with open("input.csv") as infile: #Read csv
reader = csv.reader(infile)
for row in reader:
result[row[0]].append(row) #Group by name
final_result = []
for k, v in result.items():
temp = v[0]
for i in v[1:]:
temp[2] += ", " + i[2] #Concatenate subject names
final_result.append(temp)
with open("output.csv", "w") as outfile:
writer = csv.writer(outfile)
writer.writerows(final_result) #Write back to csv
我写了一个 python 脚本,它从我们的 SIS 中获取大量学生数据,显示每个学生的完整 class 时间表。每个 class 都在自己的行中,因此每个学生都会有多行,因为他们有多个 class。该脚本写入一个新的 csv 文件,只有我需要的数据在脚本中定义为仅查找某些 class 名称。
这一切都按预期工作,但是......在最终的 csv 文件中,而不是像这样的多行:
jane doe, 123456, Language arts, Teacherone@ourdomain.org
jane doe, 123456, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, English 101, Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts, Teacherone@ourdomain.org
Johnny Appleseed, 321321, Math, Teacherone@ourdomai.org
我希望最终的 csv 文件如下所示:
Jane doe, 123456, Language Arts; Math, Teacherone@ourdomain.org;
Teachertwo@ourdomain.org
Suzie Que, 321256, Math; English 101, Teachertwo@ourdomain.org;
Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts; Math, Teacherone@ourdomain.org
我已经研究了 pandas,但不知道我将如何实现它。
如有任何帮助,我们将不胜感激。
代码如下:
import csv
def ixl():
with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL
CSV\IXL_DATA2.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL
CSV\NEW_studentexport.csv', mode='w', newline='') as output_file:
write = csv.writer(output_file, delimiter=',',
quoting=csv.QUOTE_MINIMAL)
for row in csv_reader:
Title = row[6]
coursename = row[9]
firstname = row[13]
lastname = row[16]
grade = row[14]
studentnumber = row[17]
studentidnumber = row[18]
teacheremail = row[19]
teacherfirst = row[20]
teacherlast = row[21]
stud_username = studentidnumber + "@highpointaca"
password = int(studentnumber) + int(studentidnumber)
if Title in ('Math 7', 'Albebra 1', 'Algebra 1 Honors',
'Algebra 2', 'Algebra 2 Honors', 'Dual Enrollment College Algebra (MAT
110',
'Dual Enrollment English Comp. (ENG 102)' , 'Reading 5' , 'Pre-Calculus
Honors' , 'Pre-Algebra8' , 'Pre-Algebra' , 'Mathematics' , 'Math K' ,
'Math
7' , 'Math 6 Honors' , 'Math 6' , 'Math 5' , 'Math 4' , 'Math 3' , 'Math
2' ,
'Math 1' , 'Language Arts 5', 'Language Arts 4', 'Language Arts 3',
'Language
Arts 2', 'Language Arts K', 'Language Arts 1', 'Language Arts', 'Geometry
Honors', 'Geometry', 'Essentials of Math I', 'English 4', 'English 3',
'English 2', 'English 1 Honors', 'English 1', 'ELA 7 Honors', 'ELA 6
Honors',
'ELA 8', 'ELA 7', 'ELA 6', 'Dual Enrollment English Comp. (ENG 101)'):
write.writerow([firstname, lastname, studentidnumber,
grade, teacheremail, stud_username, password, Title])
if __name__ == '__main__':
ixl()
使用 csv
模块和 collections.defaultdict
演示:
import csv
from collections import defaultdict
result = defaultdict(list)
with open("input.csv") as infile: #Read csv
reader = csv.reader(infile)
for row in reader:
result[row[0]].append(row) #Group by name
final_result = []
for k, v in result.items():
temp = v[0]
for i in v[1:]:
temp[2] += ", " + i[2] #Concatenate subject names
final_result.append(temp)
with open("output.csv", "w") as outfile:
writer = csv.writer(outfile)
writer.writerows(final_result) #Write back to csv