python

Question

我有一个看起来像这样的 CSV：

F02303521,"Smith,Andy",GHI,"Smith,Andy",GHI,,,
F04300621,"Parker,Helen",CERT,"Yu,Betty",IOUS,,,

我想删除第 2 列等于第 4 列的所有行（例如 Smith,Andy = Smith,Andy 时）。我尝试在 python 中使用 " 作为分隔符并将列拆分为：

F02303521,Smith,Andy,GHI,Smith,Andy,GHI,,,

我试过这个 python 代码：

testCSV = 'test.csv'
deletionText = 'linestodelete.txt'
correct = 'correctone.csv'
i = 0
j = 0  #where i & j keep track of line number 

with open(deletionText,'w') as outfile: 
    with open(testCSV, 'r') as csv:  
        for line in csv:
            i = i + 1 #on the first line, i will equal 1. 
            PI = line.split('"')[1]
            investigator = line.split('"')[3]

        #if they equal each other, write that line number into the text file
        as to be deleted. 
        if PI == investigator:
            outfile.write(i)



#From the TXT, create a list of line numbers you do not want to include in output
with open(deletionText, 'r') as txt:
    lines_to_be_removed_list = []

    # for each line number in the TXT
    # remove the return character at the end of line
    # and add the line number to list domains-to-be-removed list
    for lineNum in txt:
        lineNum = lineNum.rstrip()
        lines_to_be_removed_list.append(lineNum)


with open(correct, 'w') as outfile:
    with open(deletionText, 'r') as csv:

        # for each line in csv
        # extract the line number
        for line in csv:
            j = j + 1 # so for the first line, the line number will be 1  


            # if csv line number is not in lines-to-be-removed list,
            # then write that to outfile
            if (j not in lines_to_be_removed_list):
                outfile.write(line)

但是对于这一行：

PI = line.split('"')[1]

我得到：

Traceback (most recent call last): File "C:/Users/sskadamb/PycharmProjects/vastDeleteLine/manipulation.py", line 11, in PI = line.split('"')[1] IndexError: list index out of range

我认为它可以 PI = Smith,Andy investigator = Smith,Andy...为什么没有发生？

任何帮助将不胜感激，谢谢！

Answer 1

尝试按逗号拆分，而不是 qoute。

x.split(",")

Answer 2

当你想到 csv 时，就会想到 pandas，它是 Python 的一个很棒的数据分析库。以下是如何完成您想要的：

import pandas as pd

fields = ['field{}'.format(i) for i in range(8)]
df = pd.read_csv("data.csv", header=None, names=fields)
df = df[df['field1'] != df['field3']]
print df

这会打印：

      field0        field1 field2    field3 field4  field5  field6  field7
1  F04300621  Parker,Helen   CERT  Yu,Betty   IOUS     NaN     NaN     NaN

python - 列表索引超出范围，使用 CSV？

python - list index out of range, working with CSV?

csv

indexoutofboundsexception