如何使模拟更快?
How to make simulation faster?
我正在为亲属关系分析做一个python模拟。输入是 EIGENSTRAT .geno 文件,包含 0 和 2 数字,每个样本 120 万个。我想模拟“坏”位置,所以随机(根据变量)将位置设置为 9,eigenstrat 中的“missing snp”是什么。
我的问题是,不知何故它真的很慢。对于 1 个样本,使用一个概率因子需要大约 1900 秒,但我想进行大约 1500 次模拟。此代码用于调试模式。我怎样才能让它更快?我尝试使用 numba,但它只为我节省了 ~100 秒。
possib = [0.95]
geno_arr = np.array([[2,0,2,0,2,0,2,0,0,2,0,0,2], [0,0,2,0,0,0,2,0,0,0,2,0,2]])
sample_list=['HG00244.SG', 'HG00356.SG']
t1_start = process_time()
for p in possib:
sample1 = geno_arr[0]
for i in range(1,len(sample_list)):
sample2 = geno_arr[1]
max_valid_marker=round(len(sample1)*p)
current_valid_marker_number=max_valid_marker
while current_valid_marker_number!=0:
sample_choice=np.array([0,1])
valami=np.random.choice(sample_choice, 1, replace=False) #random.choice(sample_choice)
pos_choice=np.arange(0, len(sample1))
pos_choice_random= np.random.choice(pos_choice, 1, replace=False)#random.choice(pos_choice)
if valami==0:
if sample1[pos_choice_random] != 9 and sample2[pos_choice_random] != 9:
sample1[pos_choice_random] = 9
current_valid_marker_number -= 1
else:
if sample1[pos_choice_random] != 9 and sample2[pos_choice_random] != 9:
sample2[pos_choice_random] = 9
current_valid_marker_number -= 1
t1_stop = process_time()
print("Full finish: ", t1_stop - t1_start)
您想以 p
的概率将数组的每一列中的一个元素设置为 9
:
geno_arr = np.array([[2,0,2,0,2,0,2,0,0,2,0,0,2], [0,0,2,0,0,0,2,0,0,0,2,0,2]])
max_valid_marker = round(geno_arr.shape[1]*p)
index1 = np.random.choice(np.arange(geno_arr.shape[0]), max_valid_marker, replace=True)
index2 = np.random.choice(np.arange(geno_arr.shape[1]), max_valid_marker, replace=False)
geno_arr[index1,index2] = 9
我正在为亲属关系分析做一个python模拟。输入是 EIGENSTRAT .geno 文件,包含 0 和 2 数字,每个样本 120 万个。我想模拟“坏”位置,所以随机(根据变量)将位置设置为 9,eigenstrat 中的“missing snp”是什么。
我的问题是,不知何故它真的很慢。对于 1 个样本,使用一个概率因子需要大约 1900 秒,但我想进行大约 1500 次模拟。此代码用于调试模式。我怎样才能让它更快?我尝试使用 numba,但它只为我节省了 ~100 秒。
possib = [0.95]
geno_arr = np.array([[2,0,2,0,2,0,2,0,0,2,0,0,2], [0,0,2,0,0,0,2,0,0,0,2,0,2]])
sample_list=['HG00244.SG', 'HG00356.SG']
t1_start = process_time()
for p in possib:
sample1 = geno_arr[0]
for i in range(1,len(sample_list)):
sample2 = geno_arr[1]
max_valid_marker=round(len(sample1)*p)
current_valid_marker_number=max_valid_marker
while current_valid_marker_number!=0:
sample_choice=np.array([0,1])
valami=np.random.choice(sample_choice, 1, replace=False) #random.choice(sample_choice)
pos_choice=np.arange(0, len(sample1))
pos_choice_random= np.random.choice(pos_choice, 1, replace=False)#random.choice(pos_choice)
if valami==0:
if sample1[pos_choice_random] != 9 and sample2[pos_choice_random] != 9:
sample1[pos_choice_random] = 9
current_valid_marker_number -= 1
else:
if sample1[pos_choice_random] != 9 and sample2[pos_choice_random] != 9:
sample2[pos_choice_random] = 9
current_valid_marker_number -= 1
t1_stop = process_time()
print("Full finish: ", t1_stop - t1_start)
您想以 p
的概率将数组的每一列中的一个元素设置为 9
:
geno_arr = np.array([[2,0,2,0,2,0,2,0,0,2,0,0,2], [0,0,2,0,0,0,2,0,0,0,2,0,2]])
max_valid_marker = round(geno_arr.shape[1]*p)
index1 = np.random.choice(np.arange(geno_arr.shape[0]), max_valid_marker, replace=True)
index2 = np.random.choice(np.arange(geno_arr.shape[1]), max_valid_marker, replace=False)
geno_arr[index1,index2] = 9