读取 CSV 并根据特定值创建仅包含指定列和子集或行的新 CSV

Question

我有一个包含 5 列的 CSV 文件：

Student Id, Course Id, Student Name, Course Name, GPA
12,112 , John,Math, 3.7
11,111 , Mohammed,Astronomy, 3.89
10,100 , Peter,Java Programming, 3.5
9,99 , Lisa,Cooking, 4.0

使用 Python 3.8 我需要读取该文件并创建一个新的 CSV 文件，其中包含更少的列，消除所有前导和尾随空格：

（学生姓名、课程名称和 GPA），

但仅在名称字段包含特定值的情况下（例如：'John'、'Lisa'）

如果没有相应名称的记录，则不要创建输出文件并打印消息（例如：'No Student name(s) found in the database'）

我正在使用以下代码创建包含所需列的新 CSV 文件，但我不确定如何 select 我的新（输出）CSV 文件的记录子集。

import csv

with open('My_Source_file.csv') as infile, open('My_Ouput_file.csv', 'w', newline='') as outfile:
    csv.writer(outfile).writerows((row[3], row[4], row[5]) for row in csv.reader(infile))

Answer 1

试试这个：

import pandas as pd

dta = pd.read_csv('test.csv')
# removing the white space from column names:
dta.columns = dta.columns.str.strip() 

# removing the tailing white space from all records in all columns:

for col in dta.columns:
    # checking if column is string:
    if dta[col].dtype == object:
        dta[col] = dta[col].str.strip()

selection_list = ['John', 'Lisa']
dta = dta[dta['Student Name'].isin(selection_list)]
    
if len(dta) != 0:
    # pass your selected column as a list like this:
    selected_columns = ["Student Id","Course Id","Student Name"]
    dta[selected_columns].to_csv('My_Ouput_file.csv', index=False)

else:
    print('No Student name(s) found in the database')

输出：

   Student Id Course Id   Student Name 
0  12         112         John         
3  9          99          Lisa

此脚本获取 selection_list 中的名称（或您感兴趣的变量）并选择所选列 (Student Name) 中包含这些名称的行。

读取 CSV 并根据特定值创建仅包含指定列和子集或行的新 CSV

Read CSV and create new CSV with only specified columns and subset or rows based on specific values

csv

python-import

export-to-csv

pandas

python-3.8