合并来自 csv 的所有列的特定列数据，其中 studentnumber 是重复的

Question

我写了一个 python 脚本，它从我们的 SIS 中获取大量学生数据，显示每个学生的完整 class 时间表。每个 class 都在自己的行中，因此每个学生都会有多行，因为他们有多个 class。该脚本写入一个新的 csv 文件，只有我需要的数据在脚本中定义为仅查找某些 class 名称。

这一切都按预期工作，但是......在最终的 csv 文件中，而不是像这样的多行：

jane doe, 123456, Language arts, Teacherone@ourdomain.org
jane doe, 123456, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, Math, Teachertwo@ourdomain.org
Suzie Que, 321256, English 101, Teacherthree@ourdomain.org
Johnny Appleseed, 321321, Language Arts, Teacherone@ourdomain.org
Johnny Appleseed, 321321, Math, Teacherone@ourdomai.org

我希望最终的 csv 文件如下所示：

Jane doe, 123456, Language Arts; Math, Teacherone@ourdomain.org; 
Teachertwo@ourdomain.org

Suzie Que, 321256, Math; English 101, Teachertwo@ourdomain.org; 
Teacherthree@ourdomain.org

Johnny Appleseed, 321321, Language Arts; Math, Teacherone@ourdomain.org

我已经研究了 pandas，但不知道我将如何实现它。

如有任何帮助，我们将不胜感激。

代码如下：

        import csv

def ixl():
    with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL 
CSV\IXL_DATA2.csv') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        with open(r'C:\Users\sftp\PS\IMPORTED\pythonscripts\ixl\IXL 
CSV\NEW_studentexport.csv', mode='w', newline='') as output_file:
            write = csv.writer(output_file, delimiter=',', 
quoting=csv.QUOTE_MINIMAL)
            for row in csv_reader:
                Title = row[6]
                coursename = row[9]
                firstname = row[13]
                lastname = row[16]
                grade = row[14]
                studentnumber = row[17]
                studentidnumber = row[18]
                teacheremail = row[19]
                teacherfirst = row[20]
                teacherlast = row[21]
                stud_username = studentidnumber + "@highpointaca"
                password = int(studentnumber) + int(studentidnumber)


                if Title in ('Math 7', 'Albebra 1', 'Algebra 1 Honors', 
'Algebra 2', 'Algebra 2 Honors', 'Dual Enrollment College Algebra (MAT 
110', 
'Dual Enrollment English Comp. (ENG 102)' , 'Reading 5' , 'Pre-Calculus 
Honors' , 'Pre-Algebra8' , 'Pre-Algebra' , 'Mathematics' , 'Math K' , 
'Math 
7' , 'Math 6 Honors' , 'Math 6' , 'Math 5' , 'Math 4' , 'Math 3' , 'Math 
2' , 
'Math 1' , 'Language Arts 5', 'Language Arts 4', 'Language Arts 3', 
'Language 
Arts 2', 'Language Arts K', 'Language Arts 1', 'Language Arts', 'Geometry 
Honors', 'Geometry', 'Essentials of Math I', 'English 4', 'English 3', 
'English 2', 'English 1 Honors', 'English 1', 'ELA 7 Honors', 'ELA 6 
Honors', 
'ELA 8', 'ELA 7', 'ELA 6', 'Dual Enrollment English Comp. (ENG 101)'):

                    write.writerow([firstname, lastname, studentidnumber, 
grade, teacheremail, stud_username, password, Title])


if __name__ == '__main__':
    ixl()

Answer 1

使用 csv 模块和 collections.defaultdict

演示：

import csv
from collections import defaultdict

result = defaultdict(list)

with open("input.csv") as infile:     #Read csv
    reader = csv.reader(infile)
    for row in reader:
        result[row[0]].append(row)     #Group by name

final_result = []    
for k, v in  result.items():
    temp = v[0]
    for i in v[1:]:
        temp[2] += ", " + i[2]         #Concatenate subject names
    final_result.append(temp)

with open("output.csv", "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerows(final_result)         #Write back to csv

合并来自 csv 的所有列的特定列数据，其中 studentnumber 是重复的

Merge specific column data from all columns of csv where studentnumber is duplicate

python

python-3.x

pandas

import-csv