如何 运行 从 csv 中随机抽样一百万次
how to run random sampling a million times from a csv
我有一个很大的 csv 看起来像这样
claim score
yes 1
yes 1
no 1
no 1
yes 1
... 1
... 1
... 1
score 都是相同的数字,我需要 运行 多次随机抽样,例如 (1000)。然后计算是计数的平均百分比
代码如下所示:
#imports
import random
import numpy
TotalYes = 0
csvFile = numpy.genfromtxt("/nas/home/twu/wind/output_1.csv",delimiter=",",dtype=None)
for j in range(1,10001):
#csv format : claim (Yes/No), value
#read in your csv file and store in array
#initialize random number generator
random.seed()
#create RandomSample array
RandSamples=[]
samplesize = 1000
#Fill RandomSample array with 10000 random samples from cvs array
for i in range(1,1001):
#for row in csvFile:
#get a random index within csvFile[]. random num range is 0 to csv array length
randIndex=random.randint(0,len(csvFile))
print randIndex
RandSamples.append(csvFile[randIndex:randIndex+1,:])
#RandSamples1=numpy.asarray(RandSamples)
#get number of 'yes' from RandomSample array
RandYesSample=[]
for i in range(0,1001):
# check to see if current record is Yes claim or no
if RandSamples[i:i+1,:1] == "yes":
#yes, copy value to yes array
RandYesSample.append (RandSamples[i:i+1,:1])
#get percent of yes in RandomSample array
PercYes = float(len(RandYesSample)) / 1000
TotalYes = TotalYes + PercYes
TotalYes = float(TotalYes) / 10000
print TotalYes
我的错误是:
if RandSamples[i:i+1,:1] == "yes":...TypeError: list indices must be
integers, not tuple
我无法让它工作。有人可以帮忙吗?
您在切片列表时遇到问题,它应该像 [start:end:step]
但您放置了一个应该删除的逗号:
csvFile[randIndex:randIndex+1,:]
应该是
csvFile[randIndex]
相同于:
if RandSamples[i] == "yes":
和:
RandYesSample.append (RandSamples[i])
我有一个很大的 csv 看起来像这样
claim score
yes 1
yes 1
no 1
no 1
yes 1
... 1
... 1
... 1
score 都是相同的数字,我需要 运行 多次随机抽样,例如 (1000)。然后计算是计数的平均百分比
代码如下所示:
#imports
import random
import numpy
TotalYes = 0
csvFile = numpy.genfromtxt("/nas/home/twu/wind/output_1.csv",delimiter=",",dtype=None)
for j in range(1,10001):
#csv format : claim (Yes/No), value
#read in your csv file and store in array
#initialize random number generator
random.seed()
#create RandomSample array
RandSamples=[]
samplesize = 1000
#Fill RandomSample array with 10000 random samples from cvs array
for i in range(1,1001):
#for row in csvFile:
#get a random index within csvFile[]. random num range is 0 to csv array length
randIndex=random.randint(0,len(csvFile))
print randIndex
RandSamples.append(csvFile[randIndex:randIndex+1,:])
#RandSamples1=numpy.asarray(RandSamples)
#get number of 'yes' from RandomSample array
RandYesSample=[]
for i in range(0,1001):
# check to see if current record is Yes claim or no
if RandSamples[i:i+1,:1] == "yes":
#yes, copy value to yes array
RandYesSample.append (RandSamples[i:i+1,:1])
#get percent of yes in RandomSample array
PercYes = float(len(RandYesSample)) / 1000
TotalYes = TotalYes + PercYes
TotalYes = float(TotalYes) / 10000
print TotalYes
我的错误是:
if RandSamples[i:i+1,:1] == "yes":...TypeError: list indices must be
integers, not tuple
我无法让它工作。有人可以帮忙吗?
您在切片列表时遇到问题,它应该像 [start:end:step]
但您放置了一个应该删除的逗号:
csvFile[randIndex:randIndex+1,:]
应该是
csvFile[randIndex]
相同于:
if RandSamples[i] == "yes":
和:
RandYesSample.append (RandSamples[i])