Pandas select 列和数据依赖于 header

Question

我有一个很大的 .csv 文件。我只想 select 包含他 time/date 的列和我通过 header 知道的其他 20 个列。

作为测试，我尝试只使用带有 header 'TIMESTAMP' 的列，我知道这是 .csv 中有 4207823 行，它只包含日期和时间。下面的代码 select 是 TIMESTAMP 列，但也继续从其他列中获取值，如下所示：

import csv
import numpy as np
import pandas

low_memory=False
f = pandas.read_csv('C:\Users\mmso2\Google Drive\MABL Wind\_Semester 2 2016\Wind Farm Info\DataB\DataB - NaN2.csv', dtype = object)#convert file to variable so it can be edited

time = f[['TIMESTAMP']]
time = time[0:4207823]#test to see if this stops time taking other data
print time

输出

                  TIMESTAMP
0       2007-08-15 21:10:00
1       2007-08-15 21:20:00
2       2007-08-15 21:30:00
3       2007-08-15 21:40:00
4       2007-08-15 21:50:00
5       2007-08-15 22:00:00
6       2007-08-15 22:10:00
7       2007-08-15 22:20:00
8       2007-08-15 22:30:00
9       2007-08-15 22:40:00
10      2007-08-15 22:50:00
11      2007-08-15 23:00:00
12      2007-08-15 23:10:00
13      2007-08-15 23:20:00
14      2007-08-15 23:30:00
15      2007-08-15 23:40:00
16      2007-08-15 23:50:00
17      2007-08-16 00:00:00
18      2007-08-16 00:10:00
19      2007-08-16 00:20:00
20      2007-08-16 00:30:00
21      2007-08-16 00:40:00
22      2007-08-16 00:50:00
23      2007-08-16 01:00:00
24      2007-08-16 01:10:00
25      2007-08-16 01:20:00
26      2007-08-16 01:30:00
27      2007-08-16 01:40:00
28      2007-08-16 01:50:00
29      2007-08-16 02:00:00 #these are from the TIMESTAMP column
...                     ...
679302              221.484 #This is from another column 
679303                  NaN
679304  2015-09-23 06:40:00
679305                  NaN
679306                  NaN
679307  2015-09-23 06:50:00
679308                  NaN
679309                  NaN
679310  2015-09-23 07:00:00

Answer 1

问题是由于输入文件中的错误，因此在 pandas.read_csv 中简单地使用 usecols 就奏效了。

下面的代码演示了选择几列数据

import csv
import pandas

low_memory=False



    #read only the selected columns
    df = pandas.read_csv('DataB - Copy - Copy.csv',delimiter=',', dtype = object,
    usecols=['TIMESTAMP', 'igmmx_U_77m', 'igmmx_U_58m', ])
    print df # see what the data looks like
    outfile = open('DataB_GreaterGabbardOnly.csv','wb')#somewhere to write the data to
    df.to_csv(outfile)#save selection to the blank .csv created above

Pandas select 列和数据依赖于 header

Pandas select columns and data dependant on header

python

csv

columnheader

python-2.7

pandas