python - 列表索引超出范围,使用 CSV?
python - list index out of range, working with CSV?
我有一个看起来像这样的 CSV:
F02303521,"Smith,Andy",GHI,"Smith,Andy",GHI,,,
F04300621,"Parker,Helen",CERT,"Yu,Betty",IOUS,,,
我想删除第 2 列等于第 4 列的所有行(例如 Smith,Andy = Smith,Andy
时)。我尝试在 python 中使用 "
作为分隔符并将列拆分为:
F02303521,
Smith,Andy
,GHI,
Smith,Andy
,GHI,,,
我试过这个 python 代码:
testCSV = 'test.csv'
deletionText = 'linestodelete.txt'
correct = 'correctone.csv'
i = 0
j = 0 #where i & j keep track of line number
with open(deletionText,'w') as outfile:
with open(testCSV, 'r') as csv:
for line in csv:
i = i + 1 #on the first line, i will equal 1.
PI = line.split('"')[1]
investigator = line.split('"')[3]
#if they equal each other, write that line number into the text file
as to be deleted.
if PI == investigator:
outfile.write(i)
#From the TXT, create a list of line numbers you do not want to include in output
with open(deletionText, 'r') as txt:
lines_to_be_removed_list = []
# for each line number in the TXT
# remove the return character at the end of line
# and add the line number to list domains-to-be-removed list
for lineNum in txt:
lineNum = lineNum.rstrip()
lines_to_be_removed_list.append(lineNum)
with open(correct, 'w') as outfile:
with open(deletionText, 'r') as csv:
# for each line in csv
# extract the line number
for line in csv:
j = j + 1 # so for the first line, the line number will be 1
# if csv line number is not in lines-to-be-removed list,
# then write that to outfile
if (j not in lines_to_be_removed_list):
outfile.write(line)
但是对于这一行:
PI = line.split('"')[1]
我得到:
Traceback (most recent call last):
File "C:/Users/sskadamb/PycharmProjects/vastDeleteLine/manipulation.py", line 11, in
PI = line.split('"')[1]
IndexError: list index out of range
我认为它可以 PI = Smith,Andy
investigator = Smith,Andy
...为什么没有发生?
任何帮助将不胜感激,谢谢!
尝试按逗号拆分,而不是 qoute。
x.split(",")
当你想到 csv 时,就会想到 pandas,它是 Python 的一个很棒的数据分析库。以下是如何完成您想要的:
import pandas as pd
fields = ['field{}'.format(i) for i in range(8)]
df = pd.read_csv("data.csv", header=None, names=fields)
df = df[df['field1'] != df['field3']]
print df
这会打印:
field0 field1 field2 field3 field4 field5 field6 field7
1 F04300621 Parker,Helen CERT Yu,Betty IOUS NaN NaN NaN
我有一个看起来像这样的 CSV:
F02303521,"Smith,Andy",GHI,"Smith,Andy",GHI,,,
F04300621,"Parker,Helen",CERT,"Yu,Betty",IOUS,,,
我想删除第 2 列等于第 4 列的所有行(例如 Smith,Andy = Smith,Andy
时)。我尝试在 python 中使用 "
作为分隔符并将列拆分为:
F02303521,
Smith,Andy
,GHI,
Smith,Andy
,GHI,,,
我试过这个 python 代码:
testCSV = 'test.csv'
deletionText = 'linestodelete.txt'
correct = 'correctone.csv'
i = 0
j = 0 #where i & j keep track of line number
with open(deletionText,'w') as outfile:
with open(testCSV, 'r') as csv:
for line in csv:
i = i + 1 #on the first line, i will equal 1.
PI = line.split('"')[1]
investigator = line.split('"')[3]
#if they equal each other, write that line number into the text file
as to be deleted.
if PI == investigator:
outfile.write(i)
#From the TXT, create a list of line numbers you do not want to include in output
with open(deletionText, 'r') as txt:
lines_to_be_removed_list = []
# for each line number in the TXT
# remove the return character at the end of line
# and add the line number to list domains-to-be-removed list
for lineNum in txt:
lineNum = lineNum.rstrip()
lines_to_be_removed_list.append(lineNum)
with open(correct, 'w') as outfile:
with open(deletionText, 'r') as csv:
# for each line in csv
# extract the line number
for line in csv:
j = j + 1 # so for the first line, the line number will be 1
# if csv line number is not in lines-to-be-removed list,
# then write that to outfile
if (j not in lines_to_be_removed_list):
outfile.write(line)
但是对于这一行:
PI = line.split('"')[1]
我得到:
Traceback (most recent call last): File "C:/Users/sskadamb/PycharmProjects/vastDeleteLine/manipulation.py", line 11, in PI = line.split('"')[1] IndexError: list index out of range
我认为它可以 PI = Smith,Andy
investigator = Smith,Andy
...为什么没有发生?
任何帮助将不胜感激,谢谢!
尝试按逗号拆分,而不是 qoute。
x.split(",")
当你想到 csv 时,就会想到 pandas,它是 Python 的一个很棒的数据分析库。以下是如何完成您想要的:
import pandas as pd
fields = ['field{}'.format(i) for i in range(8)]
df = pd.read_csv("data.csv", header=None, names=fields)
df = df[df['field1'] != df['field3']]
print df
这会打印:
field0 field1 field2 field3 field4 field5 field6 field7
1 F04300621 Parker,Helen CERT Yu,Betty IOUS NaN NaN NaN