单维numpy数组的Tensorflow批量训练,无需将其转换为多维numpy数组

Tensorflow batch training of single dimension numpy array without converting it into multiple dimension numpy arrays

我对 numpy 数组到张量有些困惑....

我的代码:

import os
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
import numpy as np
import random

n = 2500
y = np.zeros((n), dtype = np.int32)

for i in range(n):
    y[i] = random.randint(0,1)

print "Before Batch Training:"
print "len(y):" , len(y)
print "y: " , y
print "y[9]: " , y[9]

batch_size = 10
num_preprocess_threads = 1
min_queue_examples = 256

y_batch = tf.train.batch([y], batch_size=batch_size, num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size, allow_smaller_final_batch=True)

print "After Batch Training:"
print "y_batch:" , y_batch
print "y_batch[9]: " , y_batch[9]

with tf.Session() as sess:

    tf.global_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    y_proccessed = sess.run(y_batch)

    print "After Session Run:"
    print "y_proccessed:" , y_proccessed
    print "y_proccessed[9]: " , y_proccessed[9]
    print "y_proccessed[0][9]: " , y_proccessed[0][9]
    print "y_proccessed[1][9]: " , y_proccessed[1][9]
    print "y_proccessed[2][9]: " , y_proccessed[2][9]
    print "y_proccessed[3][9]: " , y_proccessed[3][9]
    print "y_proccessed[4][9]: " , y_proccessed[4][9]
    print "y_proccessed[5][9]: " , y_proccessed[5][9]

    coord.request_stop()
    coord.join(threads)

sess.close()

执行后结果:

Before Batch Training:
len(y): 2500
y:  [0 0 1 ..., 1 1 1]
y[9]:  1
After Batch Training:
y_batch: Tensor("batch:0", shape=(?, 2500), dtype=int32)
y_batch[9]:  Tensor("strided_slice:0", shape=(2500,), dtype=int32)
After Session Run:
y_proccessed: [[0 0 1 ..., 1 1 1]
 [0 0 1 ..., 1 1 1]
 [0 0 1 ..., 1 1 1]
 ..., 
 [0 0 1 ..., 1 1 1]
 [0 0 1 ..., 1 1 1]
 [0 0 1 ..., 1 1 1]]
y_proccessed[9]:  [0 0 1 ..., 1 1 1]
y_proccessed[0][9]:  1
y_proccessed[1][9]:  1
y_proccessed[2][9]:  1
y_proccessed[3][9]:  1
y_proccessed[4][9]:  1
y_proccessed[5][9]:  1

我的困惑在哪里 y_proccessed[9] 应该像 y[9] 一样生成“1”的结果,而不是生成 [0 0 1 ..., 1 1 1]?

另一方面,如果您查看 y_proccessed 会产生

    [[0 0 1 ..., 1 1 1]
     [0 0 1 ..., 1 1 1]
     [0 0 1 ..., 1 1 1]
     ..., 
     [0 0 1 ..., 1 1 1]
     [0 0 1 ..., 1 1 1]
     [0 0 1 ..., 1 1 1]]

它生成相同的冗余前 10 个数组,它应该循环到另一个批次的其他 10 个子序列数组?

谢谢

设法修复它:

import os
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
import numpy as np
import random

n = 2500
y = np.zeros((n), dtype = np.int32)

for i in range(n):
    y[i] = random.randint(0,1)

print "Before Batch Training:"
print "len(y):" , len(y)
print "y: " , y
print "y[9]: " , y[9]

batch_size = 10
num_preprocess_threads = 1
min_queue_examples = 256

#adding enqueue_many=True into the tf.train.batch
y_batch = tf.train.batch([y], batch_size=batch_size, num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size, enqueue_many=True, allow_smaller_final_batch=True)

print "After Batch Training:"
print "y_batch:" , y_batch
print "y_batch[9]: " , y_batch[9]

with tf.Session() as sess:

    tf.global_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    y_proccessed = sess.run(y_batch)

    print "After Session Run:"
    print "y_proccessed:" , y_proccessed
    print "y_proccessed[9]: " , y_proccessed[9]
    print "y_proccessed[0][9]: " , y_proccessed[0][9]
    print "y_proccessed[1][9]: " , y_proccessed[1][9]
    print "y_proccessed[2][9]: " , y_proccessed[2][9]
    print "y_proccessed[3][9]: " , y_proccessed[3][9]
    print "y_proccessed[4][9]: " , y_proccessed[4][9]
    print "y_proccessed[5][9]: " , y_proccessed[5][9]

    coord.request_stop()
    coord.join(threads)

sess.close()

我留下一些额外的资源以供参考:

http://ischlag.github.io/2016/11/07/tensorflow-input-pipeline-for-large-datasets/

https://github.com/dennybritz/tf-rnn/blob/master/sequence_example.ipynb

顺便说一句,如果有人有更好的解决方案请post它。

谢谢